Filter Pandas DataFrames with query() & isin() using Lists

One useful technique is using the Pandas query() method and isin() function to filter rows based on a list of values.

In this tutorial, you’ll learn how to use the query method with isin() to filter Pandas DataFrames based on Python lists.

 

 

Filtering Single Column Using query() and isin()

The isin() method in Pandas allows you to filter data by matching multiple values in a specific column.

Here’s a basic example to filter rows based on a list of values in the ‘PlanType’ column.

import pandas as pd
df = pd.DataFrame({
    'CustomerID': [1, 2, 3, 4, 5],
    'PlanType': ['Basic', 'Premium', 'Basic', 'Standard', 'Premium'],
    'MonthlyCharge': [20, 50, 20, 30, 50]
})

# Filter rows where 'PlanType' is either 'Basic' or 'Premium'
filtered_df = df.query("PlanType.isin(['Basic', 'Premium'])")
print(filtered_df)

Output:

   CustomerID  PlanType  MonthlyCharge
0           1     Basic             20
1           2   Premium             50
2           3     Basic             20
4           5   Premium             50

 

Combining isin() with Other Conditions in query()

You can combine isin() with other conditions using logical operators like & (and) or | (or).

# Filter rows where 'PlanType' is 'Basic' or 'Premium', and 'MonthlyCharge' is greater than 20
filtered_df = df.query("PlanType.isin(['Basic', 'Premium']) & MonthlyCharge > 20")
print(filtered_df)

Output:

   CustomerID  PlanType  MonthlyCharge
1           2   Premium             50
4           5   Premium             50

 

Using isin() on Multiple Columns in query()

You can apply the isin() function to multiple columns for more complex filtering.

# Sample data with an additional 'Region' column
df['Region'] = ['West', 'East', 'West', 'East', 'South']

# Filter rows where 'PlanType' is 'Basic' or 'Premium' and 'Region' is 'East' or 'West'
filtered_df = df.query("PlanType.isin(['Basic', 'Premium']) & Region.isin(['East', 'West'])")
print(filtered_df)

Output:

   CustomerID  PlanType  MonthlyCharge Region
0           1     Basic             20   West
1           2   Premium             50   East
2           3     Basic             20   West

 

Using isin() with ~ for Negation in query()

You can use the tilde symbol ~ to select rows that don’t match the values in the list.

# Filter rows where 'PlanType' is not 'Basic' or 'Premium'
filtered_df = df.query("~PlanType.isin(['Basic', 'Premium'])")
print(filtered_df)

Output:

   CustomerID  PlanType  MonthlyCharge Region
3           4  Standard             30   East

 

Using Variables in query() and isin()

If your list of values is stored in a variable, you can reference it in query() using an @ symbol.

allowed_plans = ['Basic', 'Premium']
allowed_regions = ['East', 'West']
filtered_df = df.query("PlanType.isin(@allowed_plans) & Region.isin(@allowed_regions)")
print(filtered_df)

Output:

   CustomerID PlanType  MonthlyCharge Region
0           1    Basic             20   West
1           2  Premium             50   East
2           3    Basic             20   West

 

Combining isin with String Methods in query()

You can combine isin with Pandas string methods to filter rows based on string conditions.

# Sample data with an additional 'Operator' column
df['Operator'] = ['XCorp', 'YCorp', 'XCorp', 'ZCorp', 'YCorp']

# Filter rows where 'Operator' starts with 'X' or 'Y' and 'PlanType' is 'Basic'
filtered_df = df.query("Operator.str.startswith('X') | Operator.str.startswith('Y')")
print(filtered_df)

Output:

   CustomerID PlanType  MonthlyCharge Region Operator
0           1    Basic             20   West    XCorp
1           2  Premium             50   East    YCorp
2           3    Basic             20   West    XCorp
4           5  Premium             50  South    YCorp

 

Filtering Using isin() and query() from External Files

Sometimes the list you want to use for filtering is stored in an external file. You can read that into a Python list and use isin().

Assume you have a file allowed_plans.txt containing:

Basic
Premium

Code:

# Reading the list from an external file
with open('allowed_plans.txt', 'r') as f:
    allowed_plans = f.read().splitlines()

# Filtering using the list from the external file
filtered_df = df.query("PlanType.isin(@allowed_plans)")
print(filtered_df)

Output would be similar to previous examples where we filtered based on ‘Basic’ and ‘Premium’ plans.

 

Using isin() and query() with Dates

The isin method is not limited to strings or numeric values; it can also be used to filter a list of datetime objects.

# Sample data with a 'StartDate' column
df['StartDate'] = pd.to_datetime(['2022-01-01', '2022-02-15', '2022-01-20', '2022-03-01', '2022-04-15'])

# List of allowed start dates
allowed_dates = pd.to_datetime(['2022-01-01', '2022-04-15'])

# Filtering based on the allowed_dates
filtered_df = df.query("StartDate.isin(@allowed_dates)")
print(filtered_df)

Output:

   CustomerID PlanType  MonthlyCharge Region Operator  StartDate
0           1    Basic             20   West    XCorp 2022-01-01
4           5  Premium             50  South    YCorp 2022-04-15
Leave a Reply

Your email address will not be published. Required fields are marked *