Using Pandas DataFrame iloc Property for Index Based Access

The iloc property in the Pandas library stands for “integer-location” and provides integer-based indexing for selection by position.

This means you can select rows and columns in a DataFrame by their integer position.

In this tutorial, we’ll cover various aspects of using iloc, including selecting single rows, multiple rows, specific columns, and even individual cells. We’ll also delve into advanced techniques like boolean indexing.

 

 

Select a Single Row by Integer Index

You can get an entire row of data by providing the integer index of the row you want to extract.

import pandas as pd
data = {
    'Name': ['John', 'Doe', 'Jane', 'Smith'],
    'Age': [28, 34, 22, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)

# Select the second row
selected_row = df.iloc[1]
print(selected_row)

Output:

Name             Doe
Age              34
City    Los Angeles
Name: 1, dtype: object

 

Select Multiple Rows Using a List of Integer Indices

Sometimes, you might want to retrieve multiple rows at once based on their positions. With iloc, you can do this by providing a list of integer indices.

This method returns a new DataFrame containing only the rows at the specified positions.

# Select the first and third rows
selected_rows = df.iloc[[0, 2]]
print(selected_rows)

Output:

   Name  Age       City
0  John   28   New York
2  Jane   22    Chicago

By passing a list containing 0 and 2 to iloc, we’ve fetched the first and third rows of our DataFrame.

 

Slice Rows Using a Range of Integer Indices

iloc also allows you to employ range-based indexing to slice rows.

You specify a start index and an end index, and optionally a step. This returns a range of consecutive rows from the DataFrame.

# Select the first three rows
sliced_rows = df.iloc[0:3]
print(sliced_rows)

Output:

   Name  Age         City
0  John   28     New York
1   Doe   34  Los Angeles
2  Jane   22      Chicago

In the provided example, we start from the 0th index (inclusive) and go up to but not including the 3rd index.

 

Select a Single Column by Integer Index

Columns are the second axis (axis=1). Utilizing iloc, you can select individual columns by their integer index.

Note that when you extract a single column using iloc, the result is a Series object, not a DataFrame.

# Select the first column
selected_column = df.iloc[:, 0]
print(selected_column)

Output:

0    John
1     Doe
2    Jane
3   Smith
Name: Name, dtype: object

The colon : in the row position means “all rows”, and the 0 following the comma specifies the first column.

 

Select Multiple Columns Using a List of Integer Indices

Just as you can select multiple rows with a list of indices, iloc supports the selection of multiple columns by providing a list of integer indices for the columns.

# Select the first and third columns
selected_columns = df.iloc[:, [0, 2]]

Output:

   Name       City
0  John   New York
1   Doe  Los Angeles
2  Jane    Chicago
3  Smith   Houston

In this example, we target the first and third columns using the list [0, 2] in the column position.

 

Slice Columns Using a Range of Integer Indices

You can use iloc combined with range-based indexing to select a continuous set of columns based on their position:

# Select the first two columns
sliced_columns = df.iloc[:, 0:2]
print(sliced_columns)

Output:

   Name  Age
0  John   28
1   Doe   34
2  Jane   22
3  Smith  45

Here, we’ve utilized the range 0:2 within the column’s position in iloc.

This selects columns starting from the 0th index (inclusive) up to, but not including, the 2nd index.

 

Select a Single Cell by Specifying Row and Column Indices

Using iloc, you can pinpoint and extract the value of a single cell by specifying both its row and column integer indices.

# Select the cell from the second row and first column
cell_value = df.iloc[1, 0]
print(cell_value)

Output:

Doe

In this code snippet, we’ve targeted the cell in the second row and first column using iloc[1, 0]. The result is the name “Doe”.

 

Select Rows for Specific Columns Using Lists of Indices

You can select multiple rows and specific columns simultaneously by providing lists of integer indices for both dimensions.

# Select the first and third rows for the first and third columns
subset = df.iloc[[0, 2], [0, 2]]
print(subset)

Output:

   Name       City
0  John   New York
2  Jane    Chicago

In the example provided, we’ve specified a list for both row and column indices: [0, 2].

This fetches the first and third rows, and within those rows, only the first and third columns.

 

Slice Rows and Columns Using Ranges of Integer Indices

With iloc, you can slice rows and columns using ranges, providing a sub-DataFrame as the output.

# Select the first three rows and first two columns
subset = df.iloc[0:3, 0:2]

Output:

   Name  Age
0  John   28
1   Doe   34
2  Jane   22

In the demonstrated code, we’ve combined two ranges: 0:3 for rows and 0:2 for columns.

This selects the first three rows and the first two columns.

 

Set the Value of a Specific Cell

Using iloc, you can set the value for any specific cell by specifying its row and column indices.

# Set the value of the cell in the second row and first column to 'Alex'
df.iloc[1, 0] = 'Alex'
print(df)

Output:

    Name  Age         City
0   John   28     New York
1   Alex   34  Los Angeles
2   Jane   22      Chicago
3  Smith  45      Houston

 

Set Values for a Row or a Set of Rows

The iloc property, you can update the values for an entire row or a set of rows:

# Set values for the third row
df.iloc[2] = ['Ella', 30, 'Seattle']

# Set values for the first and fourth rows
df.iloc[[0, 3]] = [['Bob', 29, 'Boston'], ['Lucas', 47, 'Miami']]
print(df)

Output:

    Name  Age         City
0    Bob   29       Boston
1   Alex   34  Los Angeles
2   Ella   30      Seattle
3  Lucas  47        Miami

In the example, we first set new values for the third row using df.iloc[2] = ['Ella', 30, 'Seattle'], updating the data for “Jane”.

Then, we target the first and fourth rows, assigning new values simultaneously.

 

Set Values for a Column or a Set of Columns

The iloc property allows you to update an entire column or several columns at once:

# Set values for the 'Age' column
df.iloc[:, 1] = [35, 36, 31, 48]

# Set values for the 'Name' and 'City' columns
df.iloc[:, [0, 2]] = [['Mia', 'Atlanta'], ['Liam', 'Dallas'], ['Sophia', 'Denver'], ['Ethan', 'Phoenix']]
print(df)

Output:

     Name  Age      City
0     Mia   35   Atlanta
1    Liam   36    Dallas
2  Sophia   31    Denver
3   Ethan   48   Phoenix

Here, we first target the ‘Age’ column and assign a new list of age values using df.iloc[:, 1].

Next, we proceed to set values for both the ‘Name’ and ‘City’ columns simultaneously.

 

Set Values for a Range of Cells (Both Rows and Columns)

The iloc property allows you can update a range of cells across both rows and columns, providing a specific slice of values to modify.

# Set values for the cells in the first two rows and last two columns
df.iloc[0:2, 1:3] = [[40, 'Orlando'], [37, 'Sacramento']]
print(df)

Output:

    Name  Age        City
0   John   40     Orlando
1    Doe   37  Sacramento
2   Jane   31     Chicago
3  Smith   48     Houston

In the demonstration above, we’ve chosen a block of cells spanning the first two rows and the last two columns of the DataFrame.

By using df.iloc[0:2, 1:3], we specify this range and set new values for the ‘Age’ and ‘City’ columns for the respective rows.

Remember, when updating a range of cells, the shape of the value you’re assigning should match the shape of the cell range you’re targeting to avoid data inconsistencies.

 

Boolean Indexing (Use Boolean Arrays/Masks)

Instead of selecting rows or columns by their integer indices, you can use arrays of boolean values (True or False) to filter rows based on certain criteria.

Let’s delve into how you can combine boolean arrays/masks with iloc to refine your DataFrame selections.

Basic Boolean Indexing

Start by creating a boolean mask based on a condition:

# Create a boolean mask for rows where Age is greater than 35
age_mask = df['Age'] > 35

Now, apply this mask using iloc:

filtered_data = df.iloc[age_mask.values]
print(filtered_data)

Output:

    Name  Age     City
3  Smith   45  Houston

In the example, we first generate a boolean mask age_mask that identifies rows where the ‘Age’ exceeds 35. When applied with iloc, only the rows with True values in the mask are retained.

Combining Multiple Conditions

You can combine multiple conditions using bitwise operators like & (and), | (or), and ~ (not).

# Create a mask for rows where Age is greater than 35 and City is 'Houston'
combined_mask = (df['Age'] > 35) & (df['City'] == 'Houston')
filtered_data = df.iloc[combined_mask.values]
print(filtered_data)

Output:

    Name  Age     City
3  Smith   45  Houston

Here, we filter for entries where the individual’s age exceeds 35, and they reside in ‘Phoenix’.

 

Error Handling and Common Pitfalls

Navigating Pandas DataFrames using iloc is typically smooth and intuitive. However, there are some potential pitfalls and errors that you might encounter.

One of the common mistakes is trying to access indices that do not exist in the DataFrame, leading to an IndexError.

# Attempting to access the fifth row in a DataFrame with only four rows
# will raise an error.
try:
    print(df.iloc[4])
except IndexError as e:
    print(f"Error: {e}")

Output:

Error: single positional indexer is out-of-bounds

To avoid this, always ensure that the indices you provide fall within the valid range for your DataFrame.

 

Resource

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

Leave a Reply

Your email address will not be published. Required fields are marked *