Using Pandas DataFrame loc Property for Label Based Access
The loc
property in Pandas allows for label-based indexing, which means that you use labels (like column names and row indices) to access particular rows and columns of a DataFrame.
Unlike iloc, which uses integer-based indexing, loc
is primarily label-based, making it capable of working with data that has meaningful labels.
- 1 Pandas DataFrame loc Syntax
- 2 Select a Single Row by Its Label
- 3 Select Multiple Rows Using a List of Labels
- 4 Select a Range of Rows Using Label Slicing
- 5 Select a Single Column Using Its Label Along With All Rows
- 6 Select Multiple Columns Using a List of Their Labels
- 7 Select a Range of Columns Using Label Slicing
- 8 Select Specific Rows and Specific Columns Using Labels
- 9 Select a Range of Rows and Specific Columns
- 10 Select Specific Rows and a Range of Columns
- 11 Select Rows Based on a Condition or Set of Conditions
- 12 Select Rows and Columns Based on Conditions
- 13 Set a Single Value in a Specific Cell
- 14 Set Values for an Entire Column or Row
- 15 Set Values Based on a Condition or Set of Conditions
- 16 Using Lambda Functions for Row or Column Selections
- 17 Behavior of loc When a Label is Missing
- 18 Resource
Pandas DataFrame loc
Syntax
The loc
property of Pandas DataFrames provides an interface to access a group of rows and columns by labels or a boolean array.
Here’s the basic syntax:
df.loc[rows, columns]
Rows: This can be a single label, a list of labels, a slice object with labels ('start':'end'
), or a boolean array.
Columns: This can be a single label, a list of labels, a slice object with labels ('start':'end'
), or a boolean array.
Select a Single Row by Its Label
Here’s a simple DataFrame to demonstrate:
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'London', 'Paris'] } df = pd.DataFrame(data) print(df)
Output:
Name Age City 0 Alice 25 New York 1 Bob 30 London 2 Charlie 35 Paris
For the purpose of demonstration, let’s first set the ‘Name’ column as the index for our DataFrame.
This will allow us to use the names as labels.
df.set_index('Name', inplace=True) print(df)
Output:
Age City Name Alice 25 New York Bob 30 London Charlie 35 Paris
Now, let’s select the row for ‘Bob’ using the loc
property:
# Selecting the row with label 'Bob' selected_row = df.loc['Bob'] print(selected_row)
Output:
Age 30 City London Name: Bob, dtype: object
Here, we’ve accessed the row with the label 'Bob'
which corresponds to the data for Bob in our DataFrame.
The result is a Pandas Series with the data for that specific row.
Select Multiple Rows Using a List of Labels
The loc
property allows you to select multiple rows simultaneously based on their labels by accepting a list of labels.
Based on the previously modified DataFrame (where ‘Name’ is the index), let’s select rows for both ‘Alice’ and ‘Charlie’.
selected_rows = df.loc[['Alice', 'Charlie']] print(selected_rows)
Output:
Age City Name Alice 25 New York Charlie 35 Paris
It’s important to be aware that the order in which you specify the labels in the list determines the order of the rows in the resulting DataFrame.
So, if you’d reversed the list to ['Charlie', 'Alice']
, Charlie’s data would have appeared before Alice’s in the output.
Select a Range of Rows Using Label Slicing
When working with a Pandas DataFrame, you can utilize label slicing with the loc
property to retrieve a consecutive range of rows based on their labels.
Unlike Python slicing, label slicing in Pandas is inclusive of the stop label.
Using our name-indexed DataFrame, let’s extract a range of rows from ‘Alice’ through ‘Charlie’.
# Selecting a range of rows from 'Alice' to 'Charlie' sliced_rows = df.loc['Alice':'Charlie'] print(sliced_rows)
Output:
Age City Name Alice 25 New York Bob 30 London Charlie 35 Paris
In this example, the slice 'Alice':'Charlie'
has selected all rows starting from ‘Alice’ and ending with ‘Charlie’, inclusive of both endpoints.
Select a Single Column Using Its Label Along With All Rows
loc
also offers column selection capabilities.
You can extract specific data sections from your DataFrame by specifying the rows and columns you want to select.
When you want to select all rows but only a specific column, you can use a colon (:
) for the rows segment.
Let’s retrieve the ‘Age’ column for all rows from our DataFrame:
# Selecting the 'Age' column for all rows age_column = df.loc[:, 'Age'] print(age_column)
Output:
Name Alice 25 Bob 30 Charlie 35 Name: Age, dtype: int64
The result is a Pandas Series, where the indices are preserved from the original DataFrame and the data corresponds to the values from the ‘Age’ column.
Select Multiple Columns Using a List of Their Labels
You can extract the data from specific columns across all rows by specifying a list of column labels with the loc
property.
Using our current DataFrame as an example, let’s select the columns ‘Age’ and ‘City’ for all rows:
selected_columns = df.loc[:, ['Age', 'City']] print(selected_columns)
Output:
Age City Name Alice 25 New York Bob 30 London Charlie 35 Paris
Here, df.loc[:, ['Age', 'City']]
fetches all rows (indicated by the colon :
) and only the columns ‘Age’ and ‘City’.
Select a Range of Columns Using Label Slicing
Just as you can slice rows based on their labels, you can slice columns in a similar way with the loc
property.
For our demonstration, let’s first rearrange our columns to have an order, then select a range.
# Rearrange columns for demonstration df = df[['City', 'Age']] print(df)
Output:
City Age Name Alice New York 25 Bob London 30 Charlie Paris 35
Now, we will slice to select columns from ‘City’ to ‘Age’:
# Selecting a range of columns from 'City' to 'Age' sliced_columns = df.loc[:, 'City':'Age'] print(sliced_columns)
Output:
City Age Name Alice New York 25 Bob London 30 Charlie Paris 35
The slice 'City':'Age'
selects all columns starting from ‘City’ and ending with ‘Age’, inclusive of both endpoints.
Select Specific Rows and Specific Columns Using Labels
Using the loc
property, you can combine row and column selection to extract specific cells or sections of your DataFrame based on both row and column labels.
Let’s use our existing DataFrame to select data for ‘Alice’ and ‘Charlie’ from the ‘City’ column:
specific_data = df.loc[['Alice', 'Charlie'], 'City'] print(specific_data)
Output:
Name Alice New York Charlie Paris Name: City, dtype: object
In this example, df.loc[['Alice', 'Charlie'], 'City']
returns a Pandas Series with the ‘City’ values for ‘Alice’ and ‘Charlie’.
But, what if we want to select multiple columns for these specific rows? It’s just as straightforward:
# Selecting 'City' and 'Age' data for 'Alice' and 'Charlie' specific_data_multi = df.loc[['Alice', 'Charlie'], ['City', 'Age']] print(specific_data_multi)
Output:
City Age Name Alice New York 25 Charlie Paris 35
Select a Range of Rows and Specific Columns
Using our existing DataFrame, let’s select the range of rows from ‘Alice’ to ‘Bob’ for the columns ‘City’ and ‘Age’:
range_specific_columns = df.loc['Alice':'Bob', ['City', 'Age']] print(range_specific_columns)
Output:
City Age Name Alice New York 25 Bob London 30
The result is a new DataFrame segment that has the sliced rows and the designated columns.
Select Specific Rows and a Range of Columns
Inversely to our previous section, you can also select specific rows while slicing a range of columns using their labels.
Continuing with our current DataFrame, let’s retrieve data for the rows ‘Alice’ and ‘Charlie’ across the range of columns from ‘City’ to ‘Age’:
rows_range_columns = df.loc[['Alice', 'Charlie'], 'City':'Age'] print(rows_range_columns)
Output:
City Age Name Alice New York 25 Charlie Paris 35
In this example, df.loc[['Alice', 'Charlie'], 'City':'Age']
selects the specified rows ‘Alice’ and ‘Charlie’, and the column range starting from ‘City’ and ending with ‘Age’.
Select Rows Based on a Condition or Set of Conditions
One of the most powerful features of the loc
property is its ability to select rows based on a specific condition or a combination of conditions.
This is valuable when you want to filter your DataFrame to include only rows that meet certain criteria.
Single Condition
For instance, let’s select all rows where the ‘Age’ is greater than 30:
age_above_30 = df.loc[df['Age'] > 30] print(age_above_30)
Output:
City Age Name Charlie Paris 35
In this case, the condition df['Age'] > 30
returns a Boolean Series that indicates whether each row meets the condition.
The loc
property then uses this Series to filter out rows that don’t meet the criteria.
Multiple Conditions
You can combine multiple conditions using the &
(and), |
(or), and ~
(not) operators.
For example, to select all rows where the ‘Age’ is greater than 25 and the ‘City’ is not ‘London’:
filtered_data = df.loc[(df['Age'] > 25) & ~(df['City'] == 'London')] print(filtered_data)
Output:
City Age Name Charlie Paris 35
Remember to use parentheses around each condition to ensure proper evaluation order.
Read more about Selecting Data by Multiple Conditions using Pandas loc.
Select Rows and Columns Based on Conditions
Pandas provides flexibility not just in selecting rows based on conditions but also in concurrently filtering rows and selecting specific columns.
Rows Based on Condition, Specific Columns
For instance, let’s fetch the ‘City’ column for rows where ‘Age’ is above 30:
city_age_above_30 = df.loc[df['Age'] > 30, 'City'] print(city_age_above_30)
Output:
Name Charlie Paris Name: City, dtype: object
Rows and Columns Based on Multiple Conditions
Suppose you want to extract the ‘City’ column for individuals aged above 25 but exclude those living in ‘London’. Here’s how:
# Selecting the 'City' column based on multiple row conditions filtered_city = df.loc[(df['Age'] > 25) & ~(df['City'] == 'London'), 'City'] print(filtered_city)
Output:
Name Charlie Paris Name: City, dtype: object
Set a Single Value in a Specific Cell
You can use the loc
property to change the value of a specific cell in a DataFrame.
Let’s say you want to update the ‘Age’ of ‘Alice’ to 28 in our existing DataFrame. Here’s how you can achieve this:
# Displaying the original DataFrame print("Original DataFrame:") print(df) # Updating the 'Age' of 'Alice' to 28 df.loc['Alice', 'Age'] = 28 # Displaying the updated DataFrame print("\nUpdated DataFrame:") print(df)
Output:
Original DataFrame: City Age Name Alice New York 25 Bob London 30 Charlie Paris 35 Updated DataFrame: City Age Name Alice New York 28 Bob London 30 Charlie Paris 35
As can be seen in the output, only the specified cell is modified, leaving the rest of the DataFrame unchanged.
Set Values for an Entire Column or Row
The loc
property allows you to update the values for an entire column or row.
Set Values for an Entire Column
If you want to assign a new value to every cell in a specific column, you can do so directly using the loc
property. Let’s assign a value of 40 to the entire ‘Age’ column:
# Displaying the original DataFrame print("Original DataFrame:") print(df) # Setting the entire 'Age' column to 40 df.loc[:, 'Age'] = 40 # Displaying the updated DataFrame print("\nUpdated DataFrame:") print(df)
Output:
Original DataFrame: City Age Name Alice New York 28 Bob London 30 Charlie Paris 35 Updated DataFrame: City Age Name Alice New York 40 Bob London 40 Charlie Paris 40
Set Values for an Entire Row
Similarly, to update values for an entire row, you can specify the row label and use the assignment operation.
Let’s say we want to update the data for ‘Alice’ to “Los Angeles” for ‘City’ and 40 for ‘Age’:
# Displaying the original DataFrame print("Original DataFrame:") print(df) # Setting the entire row for 'Alice' df.loc['Alice', :] = ["Los Angeles", 40] # Displaying the updated DataFrame print("\nUpdated DataFrame:") print(df)
Output:
Original DataFrame: City Age Name Alice New York 28 Bob London 30 Charlie Paris 35 Updated DataFrame: Age City Name Alice Los Angeles 40 Bob 30 London Charlie 35 Paris
By specifying the row label ‘Alice’ and using :
for all columns, we direct the assignment to target the entire row for ‘Alice’.
Set Values Based on a Condition or Set of Conditions
Applying conditional logic to modify DataFrame values is a core feature of Pandas. This is useful for tasks such as data cleaning, transformation, and imputation.
Set Values for a Single Column Based on a Condition
Let’s say we want to categorize ages in our DataFrame. We’ll set a new column ‘Category’ as ‘Young’ for ages below 35 and ‘Old’ otherwise:
# Initialize a new column 'Category' with default value df['Category'] = 'Old' # Update 'Category' based on the 'Age' condition df.loc[df['Age'] < 35, 'Category'] = 'Young' print(df)
Output:
Age City Category Name Alice 25 New York Young Bob 30 London Young Charlie 35 Paris Old
Set Values Based on Multiple Conditions
Suppose you want to set the ‘City’ for individuals aged 35 and older who are categorized as ‘Old’ to ‘Unknown’:
df['Category'] = 'Old' df.loc[(df['Age'] >= 35) & (df['Category'] == 'Old'), 'City'] = 'Unknown' print(df)
Output:
Age City Category Name Alice 25 New York Old Bob 30 London Old Charlie 35 Unknown Old
Set Values Across Multiple Columns Based on a Condition
Maybe you want to reset both ‘City’ and ‘Category’ for individuals whose age is 35:
df['Category'] = 'Old' df.loc[df['Age'] == 35, ['City', 'Category']] = ['Not Specified', 'Middle-aged'] print(df)
Output:
Age City Category Name Alice 25 New York Old Bob 30 London Old Charlie 35 Not Specified Middle-aged
Using Lambda Functions for Row or Column Selections
When lambda functions are combined with Pandas, they offer a powerful method to make dynamic row or column selections in DataFrames based on custom logic.
Using Lambda with apply()
for Row Selection
Suppose you want to select rows where the length of the city name is more than 6 characters:
filtered_rows = df.loc[df['City'].apply(lambda x: len(x) > 6)] print(filtered_rows)
Output:
Age City Name Alice 25 New York
Using Lambda with apply()
for Column Selection
To select columns where at least one row has a value length greater than 8:
# Sample DataFrame for this example df = pd.DataFrame({ 'City': ['New York', 'London', 'Paris'], 'Description': ['Big Apple', 'The City', 'City of Lights'] }) filtered_columns = df.loc[:, df.applymap(lambda x: len(str(x)) > 8).any()] print(filtered_columns)
Using Lambda with applymap()
for Element-wise Operations
You can also use lambda functions with applymap()
for element-wise operations. For instance, to transform all strings in the DataFrame to their upper-case versions:
upper_case_df = df.applymap(lambda x: str(x).upper()) print(upper_case_df)
Output:
City Description 0 NEW YORK BIG APPLE 1 LONDON THE CITY 2 PARIS CITY OF LIGHTS
Behavior of loc
When a Label is Missing
The loc
property in Pandas is primarily label-based, which means you use it to refer to rows or columns based on their labels (or index for rows).
However, when you attempt to access a label that does not exist in the DataFrame or Series using loc
, it raises a KeyError
.
Let’s understand the behavior and handling strategies through examples:
Access a Non-existent Row Label
Suppose we try to access a row with the label ‘Daniel’ in our sample DataFrame, and ‘Daniel’ is not one of the row labels:
try: print(df.loc['Daniel']) except KeyError as e: print(f"KeyError: {e}")
Output:
KeyError: 'the label [Daniel] is not in the [index]'
Access a Non-existent Column Label
Similarly, trying to access a column that doesn’t exist will also result in a KeyError
. For instance, trying to access a column ‘Salary’:
try: print(df.loc[:, 'Salary']) except KeyError as e: print(f"KeyError: {e}")
Output:
KeyError: 'the label [Salary] is not in the [columns]'
Handling Missing Labels
Check Before Accessing: One strategy is to check if the label exists before attempting to access it.
if 'Daniel' in df.index: print(df.loc['Daniel']) else: print("'Daniel' not found in the DataFrame.")
Using get()
Method: Another approach is to use the get()
method which doesn’t raise an error for missing labels but instead returns a default value.
result = df.get('Daniel') if result is None: print("'Daniel' not found in the DataFrame.") else: print(result)
Resource
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.