Pandas loc vs iloc: When to Use Each for Data Selection

The loc and iloc properties are part of the Pandas library, allowing data selection from DataFrames based on certain criteria.
This tutorial will discuss the key differences between these two properties and how to make sure you’re using the right one for your specific needs.
This table summarizes the differences between both properties:

Property loc iloc
Indexing type Label-based Integer-based
Data selected By the value of the index By the position in the DataFrame
Format of arguments Single label, list of labels, slice object with labels, boolean array Single integer, list of integers, slice objects, boolean array
Includes final value in range Yes No
Handles boolean indexing Directly Indirectly, using a boolean array
Can result in confusion with integer index Yes, if index also contains integers No
Performance Slightly faster

Now, let’s dig into the details. First, let’s create a DataFrame as a sample:

import pandas as pd
data = pd.DataFrame({
  'Age': [26, 27, 28, 29, 30],
  'Height': [165, 168, 170, 173, 175],
  'Weight': [55, 58, 60, 63, 65]}, 
  index=['Emily', 'Ava', 'Charlotte', 'Sophia', 'Olivia']


           Age  Height  Weight
Emily       26     165      55
Ava         27     168      58
Charlotte   28     170      60
Sophia      29     173      63
Olivia      30     175      65

This is the DataFrame we’ll be dealing with. As you can see, it holds data for five individuals and contains their Ages, Heights, and Weights.



Selecting Rows with loc and iloc

Both loc and iloc can be used for row selections, but they differ slightly in how they operate.

# Selecting a range of rows with loc


           Age  Height  Weight
Emily       26     165      55
Ava         27     168      58
Charlotte   28     170      60

With loc, we can select a range of rows, and importantly, the range is inclusive of both start and stop index. The above code selects three rows from ‘Emily’ to ‘Charlotte’ inclusive.

# Selecting a range of rows with iloc


      Age  Height  Weight
Emily   26     165      55
Ava     27     168      58

When we select rows with iloc, we need to remember that unlike loc, the range is inclusive of the start index and exclusive of the stop index. Essentially, it is consistent with typical Python indexing.


Slicing with loc and iloc

Let’s explore how loc and iloc differ when it comes to slicing DataFrames

# Slicing with loc
print(data.loc['Emily':'Sophia', 'Age':'Height'])


           Age  Height
Emily       26     165
Ava         27     168
Charlotte   28     170
Sophia      29     173

Using loc, we can slice both rows and columns by specifying the range.

Note that the range is inclusive of the stop index here.

# Slicing with iloc
print(data.iloc[0:4, 0:2])


           Age  Height
Emily       26     165
Ava         27     168
Charlotte   28     170
Sophia      29     173

Unlike loc, the stop index here is exclusive.


Boolean Indexing

Boolean indexing in Pandas works by selecting rows in the DataFrame where the condition is true.

# Boolean indexing with loc
print(data.loc[data['Age'] > 27])


           Age  Height  Weight
Charlotte   28     170      60
Sophia      29     173      63
Olivia      30     175      65

In this example, we use loc to select all the rows where the ‘Age’ is more than 27.

The condition within the braces is a boolean condition that checks if ‘Age’ is greater than 27.

The loc property then returns only the rows where this condition is True.

With iloc, however, things are different.

# Attempting boolean indexing with iloc
# Uncomment the below lines to run the code
#print(data.iloc[data['Age'] > 27])

If you attempt to do boolean indexing with iloc, as in the commented code above, you’ll run into an error.

This highlights an important difference between loc and iloc — iloc does not support boolean indexing directly.

Instead, you need to get a boolean index and then use it for data selection.

# Boolean indexing workaround with iloc
boolean_index = data['Age'] > 27


           Age  Height  Weight
Charlotte   28     170      60
Sophia      29     173      63
Olivia      30     175      65

In the code above, we first obtain a boolean index where ‘Age’ is greater than 27.

This index is then passed to iloc method to get the desired rows.


Performance Comparison

Let’s take a look at the performance of loc and iloc with a simple benchmark.

import timeit
import pandas as pd
import numpy as np

# Create a large DataFrame
large_data = pd.DataFrame(np.random.rand(10000, 10000))

# Time loc
loc_start_time = timeit.default_timer()
large_data.loc[0, 0]
loc_end_time = timeit.default_timer()
loc_time = loc_end_time - loc_start_time

# Time iloc
iloc_start_time = timeit.default_timer()
large_data.iloc[0, 0]
iloc_end_time = timeit.default_timer()
iloc_time = iloc_end_time - iloc_start_time

print("Time taken for loc: {:.6f} seconds".format(loc_time))
print("Time taken for iloc: {:.6f} seconds".format(iloc_time))


Time taken for loc: 0.005681 seconds
Time taken for iloc: 0.000095 seconds

Notice that actual processing time will vary depending on the environment, the size of the DataFrame, and the task that is being performed.

From the example above, we can conclude that iloc performed slightly faster than loc.

The conclusion is, if you know that you’re going to be referring to rows/columns based on their integer index, and speed is a priority, iloc could possibly be your choice.

However, the general advice is to make your choice based more on whether your indexing needs are label-based (loc) or integer-based (iloc) rather than on which is faster.


When to Use loc vs. iloc

While they offer very similar functionality, have fundamental differences that make them suited to different tasks.

Use loc when:

  • You want to label-based indexing. loc is designed to handle this scenario.
  • The dataset has a string index.
  • You are using boolean indexing. loc can accept boolean arrays for indexing directly.

Use iloc when:

  • Your DataFrame has a numeric index. While loc can work with numeric labels, it can sometimes lead to confusion if your labels are also integers.
  • You rely on the position of the item in the DataFrame.
  • You need a slight speed boost. In computational tests, iloc performs slightly better than loc.
Leave a Reply

Your email address will not be published. Required fields are marked *