Selecting Data by Multiple Conditions using Pandas loc
Pandas offers loc property, a label-based data selection method that allows you to select data based on multiple conditions.
This tutorial aims to guide you on using the Pandas loc function under multiple conditions.
- 1 Understanding Logical Operators for Multiple Conditions
- 2 Combining Conditions with AND operator (&)
- 3 Combining Conditions with OR operator (|)
- 4 Using Multiple Conditions with NOT operator (~)
- 5 Combining More Than Two Conditions
- 6 Multiple Conditions with a Specific Column
- 7 Select Multiple Columns with Multiple Conditions
- 8 Handling Null Values Using loc and Multiple Conditions
- 9 Using loc with Multiple Conditions for Date/Time Data
- 10 Using loc with Multiple Conditions for Numerical Data
- 11 Using loc with Multiple Conditions for Categorical Data
- 12 Updating DataFrame Based on Multiple Conditions
- 13 Real-world Example
Understanding Logical Operators for Multiple Conditions
Logical operators in Python, such as ‘and’, ‘or’, and ‘not’, allow combining simple conditions to form complex conditional statements.
Logical Operators
- ‘and’: Returns True if both conditions are true
- ‘or’: Returns True if at least one condition is true
- ‘not’: Returns False if the condition is true (reverses the condition)
Here’s a quick run-through:
print(True and False) print(True or False) print(not True)
Output:
False True False
In the code above, the ‘and’ operator returns False because both conditions (True and False) are not True.
The ‘or’ operator returns True because at least one of the conditions (True or False) is True.
Lastly, the ‘not’ operator returns False because the condition is True and ‘not’ reverses the condition.
Combining Conditions with AND operator (&)
Let’s now dive into combining conditions with the AND operator shown as &
in Pandas.
We will start with a simple DataFrame example:
import pandas as pd data = {'Name': ['Tom', 'Nick', 'John', 'Tom'], 'Age': [20, 21, 19, 18], 'Grade': ['A', 'B', 'A', 'C']} df = pd.DataFrame(data) print(df)
Output:
Name Age Grade 0 Tom 20 A 1 Nick 21 B 2 John 19 A 3 Tom 18 C
In this DataFrame, we have ‘Name’, ‘Age’, and ‘Grade’ as columns.
Now, let’s say your task is to fetch the rows where ‘Age’>19 and ‘Grade’=’A’. Here’s how you can do it:
print(df.loc[(df['Age'] > 19) & (df['Grade'] == 'A')])
Output:
Name Age Grade 0 Tom 20 A
Using the loc
function and the ‘and’ operator (&), the code fetches the rows satisfying both conditions: ‘Age’ > 19 and ‘Grade’ equals ‘A’.
Combining Conditions with OR operator (|)
Referring again to our previous DataFrame, this time let’s fetch rows where ‘Age’ > 19 or ‘Grade’=’A’.
print(df.loc[(df['Age'] > 19) | (df['Grade'] == 'A')])
Output:
Name Age Grade 0 Tom 20 A 1 Nick 21 B 2 John 19 A
In the above code, we’ve used the loc
function with the OR operator (|
). This returns the rows fulfilling at least one of the conditions: ‘Age’>19 or ‘Grade’ equals ‘A’.
Using Multiple Conditions with NOT operator (~)
The NOT operator is represented as ~
in Pandas. The NOT operator is useful if you want to get the reverse of a condition.
Sticking to our DataFrame from before, let’s get the rows where ‘Grade’ is NOT ‘A’.
print(df.loc[~(df['Grade'] == 'A')])
Output:
Name Age Grade 1 Nick 21 B 3 Tom 18 C
This code uses the loc
function with the NOT operator (~
) to return the rows where ‘Grade’ is not ‘A’. As you can see, row 0 and row 2 are excluded in the output as they have ‘Grade’ equals ‘A’.
Combining More Than Two Conditions
Let’s use our previous DataFrame and fetch the rows where ‘Age’ is greater than 19, ‘Grade’ is ‘A’, and ‘Name’ is ‘Tom’.
print(df.loc[(df['Age'] > 19) & (df['Grade'] == 'A') & (df['Name'] == 'Tom')])
Output:
Name Age Grade 0 Tom 20 A
In the above code, we’ve extended the conditions in our loc
function to use three conditions instead of two.
Multiple Conditions with a Specific Column
Let’s see how we can select a specific column with multiple conditions using Pandas loc
.
We take data from previous examples:
print(df.loc[(df['Grade'] == 'A') & (df['Age'] > 19), 'Name'])
Output:
0 Tom Name: Name, dtype: object
In the code snippet, we’ve used loc
to select all rows where ‘Grade’ equals ‘A’ and ‘Age’ is more than 19.
Furthermore, by appending 'Name'
at the end of the loc
function, it selects only the ‘Name’ column from those rows that fulfill the conditions.
Select Multiple Columns with Multiple Conditions
Let’s see how you can select multiple columns with multiple conditions with loc
.
print(df.loc[(df['Age'] > 19), ['Name', 'Grade']])
Output:
Name Grade 0 Tom A 1 Nick B
In the above code, we’ve used ‘loc’ to select all rows where ‘Age’ is more than 19.
Additionally, by adding a list, ['Name', 'Grade']
, it selects only the ‘Name’ and ‘Grade’ columns from the rows that fulfill the conditions.
Handling Null Values Using loc and Multiple Conditions
To give you an illustration, let’s modify our DataFrame to include null values:
import numpy as np data = {'Name': ['Tom', 'Nick', 'John', 'Tom'], 'Age': [20, 21, 19, np.nan], 'Grade': ['A', 'B', 'A', 'C']} df = pd.DataFrame(data) print(df)
Output:
Name Age Grade 0 Tom 20.0 A 1 Nick 21.0 B 2 John 19.0 A 3 Tom NaN C
Here, you see the ‘Age’ of the last entry (‘Tom’) is NaN (Not a Number), indicating a null or missing value.
Now, let’s fetch the rows where ‘Age’ is not null and ‘Grade’ equals ‘A’.
print(df.loc[(df['Grade'] == 'A') & (df['Age'].notna())])
Output:
Name Age Grade 0 Tom 20.0 A 2 John 19.0 A
This code uses the notna()
function in combination with loc
and logical AND to select only those rows where ‘Grade’ is ‘A’ and ‘Age’ is not null.
Using loc with Multiple Conditions for Date/Time Data
Let’s start with a simple DataFrame with date/time as one of the columns.
data = {'Date': pd.date_range(start='1/1/2020', end='1/10/2020'), 'Value': range(10)} df = pd.DataFrame(data) print(df)
Output:
Date Value 0 2020-01-01 0 1 2020-01-02 1 2 2020-01-03 2 3 2020-01-04 3 4 2020-01-05 4 5 2020-01-06 5 6 2020-01-07 6 7 2020-01-08 7 8 2020-01-09 8 9 2020-01-10 9
To select all rows where ‘Date’ is later than January 3, 2020 and ‘Value’ is greater than 5, you can use the following code:
print(df.loc[(df['Date'] > '2020-01-03') & (df['Value'] > 5)])
Output:
Date Value 6 2020-01-07 6 7 2020-01-08 7 8 2020-01-09 8 9 2020-01-10 9
The loc
function, in combination with the logical AND operator, filters the DataFrame for rows where ‘Date’ is after ‘2020-01-03’ and ‘Value’ is more than 5.
Using loc with Multiple Conditions for Numerical Data
Let’s consider a simple DataFrame where we have numerical data.
data = {'Column1': [15, 20, 35, 40, 55], 'Column2': [100, 200, 300, 400, 500]} df = pd.DataFrame(data) print(df)
Output:
Column1 Column2 0 15 100 1 20 200 2 35 300 3 40 400 4 55 500
To select rows where ‘Column1’ is greater than 20 and ‘Column2’ is less than 500, you can use the following code:
print(df.loc[(df['Column1'] > 20) & (df['Column2'] < 500)])
Output:
Column1 Column2 2 35 300 3 40 400
The loc
function, with the logical AND operator (&), filters the DataFrame for rows where ‘Column1’ is greater than 20 and ‘Column2’ is less than 500.
Using loc with Multiple Conditions for Categorical Data
Consider this DataFrame:
data = {'Name': ['Tom', 'Nick', 'John', 'Tom'], 'Color': ['Blue', 'Green', 'Red', 'Blue'], 'Number': [1, 2, 3, 4]} df = pd.DataFrame(data) print(df)
Output:
Name Color Number 0 Tom Blue 1 1 Nick Green 2 2 John Red 3 3 Tom Blue 4
Let’s display the rows where ‘Color’ is ‘Blue’ and ‘Number’ is greater than 1.
print(df.loc[(df['Color'] == 'Blue') & (df['Number'] > 1)])
Output:
Name Color Number 3 Tom Blue 4
In the code above, loc
combined with the AND operator filters DataFrame for rows where ‘Color’ is ‘Blue’ and ‘Number’ is more than 1.
Updating DataFrame Based on Multiple Conditions
Let’s go back to our DataFrame:
data = {'Name': ['Tom', 'Nick', 'John', 'Tom'], 'Age': [20, 21, 19, 18], 'Grade': ['A', 'B', 'A', 'C']} df = pd.DataFrame(data) print(df)
Output:
Name Age Grade 0 Tom 20 A 1 Nick 21 B 2 John 19 A 3 Tom 18 C
Let’s say you want to change the ‘Grade’ from ‘B’ to ‘A’ for those with ‘Age’ > 19. Here’s how you can do that:
df.loc[(df['Age'] > 19) & (df['Grade'] == 'B'), 'Grade'] = 'A' print(df)
Output:
Name Age Grade 0 Tom 20 A 1 Nick 21 A 2 John 19 A 3 Tom 18 C
In the above code, loc
has selected the rows where ‘Age’ > 19 and ‘Grade’ is ‘B’. It then updates the ‘Grade’ to ‘A’.
Real-world Example
Back in my days at the telecom company in Egypt, analyzing large datasets was a regular part of my job.
We gathered millions of data points from our numerous customers, extending from their call history, data usage, top-up patterns, and complaint logs.
Once I had a task to identify the subscribers who used more than 5GB data, placed more than 1000 minutes of calls, and registered at least three complaints in the last six months.
By using Pandas loc
with multiple conditions, the task mentioned above became a matter of just a few lines of code!
target_subscribers = df.loc[(df['Data Usage'] > 5) & (df['Call Minutes'] > 1000) & (df['Complaints'] >= 3)]
This was extremely beneficial when the network team needed to explore different thresholds to identify potential service improvement areas.
To conclude, using Pandas loc
with multiple conditions proves invaluable in many data analysis operations, allowing you to filter, select, or change data in versatile and efficient ways.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.