Filter Using Pandas query method with multiple conditions

The query method in Pandas allows you to filter your DataFrame using string expressions.

In this tutorial, you’ll learn how to use the query method to filter data with multiple conditions.

 

 

Combining Conditions with & (AND)

When you need to combine multiple conditions, the ampersand (&) serves as the logical AND operator.

The syntax is straightforward: you specify each condition within a string, and separate them using &.

Firstly, let’s create a sample DataFrame:

import pandas as pd
data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['John', 'Emily', 'Michael', 'Sarah', 'Jessica'],
    'Age': [28, 24, 22, 25, 29],
    'Salary': [50000, 55000, 60000, 65000, 70000]
}
df = pd.DataFrame(data)
print(df)

Output:

   ID      Name  Age  Salary
0   1      John   28   50000
1   2     Emily   24   55000
2   3   Michael   22   60000
3   4     Sarah   25   65000
4   5   Jessica   29   70000

Here, you have a DataFrame with the columns ID, Name, Age, and Salary.

Now, let’s use the query method to filter out records where Age is greater than 24 and Salary is less than 70000:

filtered_df = df.query("Age > 24 & Salary < 70000")
print(filtered_df)

Output:

   ID   Name  Age  Salary
0   1   John   28   50000
3   4  Sarah   25   65000

 

Combining Conditions with | (OR)

The pipe character (|) acts as the logical OR operator in the query method.

With this operator, you can find records that satisfy either one condition or another, or even both.

Continuing from the previous example, let’s filter the records where either Age is less than 25 or Salary is greater than 60000:

filtered_df = df.query("Age < 25 | Salary > 60000")
print(filtered_df)

Output:

   ID      Name  Age  Salary
1   2     Emily   24   55000
2   3   Michael   22   60000
3   4     Sarah   25   65000
4   5   Jessica   29   70000

In this output, you’ll find that the records meet at least one of the specified conditions—either Age is less than 25 or Salary is greater than 60000.

 

Combining Conditions with ~ (NOT)

Sometimes, you’ll find yourself in a situation where you want to exclude specific records based on a condition.

The tilde (~) serves as the logical NOT operator in the query method.

By incorporating ~, you can filter out records that meet a particular condition.

Let’s use the query method to filter out records where the Age is not greater than 24:

filtered_df = df.query("~(Age > 24)")
print(filtered_df)

Output:

   ID     Name  Age  Salary
1   2    Emily   24   55000
2   3  Michael   22   60000

The ~ operator negates the condition specified.

 

Parentheses and Order of Operations

You can use parentheses to dictate the order in which conditions are evaluated in the query method.

Suppose you want to select records where either the Age is greater than 24 and the Salary is less than 70000, or the Age is less than 25.

filtered_df = df.query("(Age > 24 & Salary < 70000) | (Age < 25)")
print(filtered_df)

Output:

   ID     Name  Age  Salary
0   1     John   28   50000
1   2    Emily   24   55000
2   3  Michael   22   60000
3   4    Sarah   25   65000

Here, parentheses make it clear that the AND operation (&) should be carried out before the OR operation (|).

 

Add Regex Condition

In the context of Pandas query method, you can integrate regex to add another dimension to your data filtering capabilities.

The .str.contains() function allows you to search for patterns within string-based columns.

Let’s modify our DataFrame a bit to include an Email column:

df['Email'] = ['john@email.com', 'emily@email.com', 'michael@email.com', 'sarah@email.com', 'jessica@email.com']
print(df)

Output:

   ID      Name  Age  Salary              Email
0   1      John   28   50000      john@email.com
1   2     Emily   24   55000     emily@email.com
2   3   Michael   22   60000   michael@email.com
3   4     Sarah   25   65000     sarah@email.com
4   5   Jessica   29   70000   jessica@email.com

Now let’s use .str.contains() along with the query method to filter out records where the Email column starts with the letter “j” and Age is greater than 28:

filtered_df = df.query('Email.str.contains("^j") & Age>28')
print(filtered_df)

Output:

   ID     Name  Age  Salary              Email
4   5  Jessica   29   70000  jessica@email.com
Leave a Reply

Your email address will not be published. Required fields are marked *