Filter Pandas DataFrames with query() & isin() using Lists
One useful technique is using the Pandas query()
method and isin()
function to filter rows based on a list of values.
In this tutorial, you’ll learn how to use the query
method with isin()
to filter Pandas DataFrames based on Python lists.
- 1 Filtering Single Column Using query() and isin()
- 2 Combining isin() with Other Conditions in query()
- 3 Using isin() on Multiple Columns in query()
- 4 Using isin() with ~ for Negation in query()
- 5 Using Variables in query() and isin()
- 6 Combining isin with String Methods in query()
- 7 Filtering Using isin() and query() from External Files
- 8 Using isin() and query() with Dates
Filtering Single Column Using query() and isin()
The isin()
method in Pandas allows you to filter data by matching multiple values in a specific column.
Here’s a basic example to filter rows based on a list of values in the ‘PlanType’ column.
import pandas as pd df = pd.DataFrame({ 'CustomerID': [1, 2, 3, 4, 5], 'PlanType': ['Basic', 'Premium', 'Basic', 'Standard', 'Premium'], 'MonthlyCharge': [20, 50, 20, 30, 50] }) # Filter rows where 'PlanType' is either 'Basic' or 'Premium' filtered_df = df.query("PlanType.isin(['Basic', 'Premium'])") print(filtered_df)
Output:
CustomerID PlanType MonthlyCharge 0 1 Basic 20 1 2 Premium 50 2 3 Basic 20 4 5 Premium 50
Combining isin() with Other Conditions in query()
You can combine isin()
with other conditions using logical operators like &
(and) or |
(or).
# Filter rows where 'PlanType' is 'Basic' or 'Premium', and 'MonthlyCharge' is greater than 20 filtered_df = df.query("PlanType.isin(['Basic', 'Premium']) & MonthlyCharge > 20") print(filtered_df)
Output:
CustomerID PlanType MonthlyCharge 1 2 Premium 50 4 5 Premium 50
Using isin() on Multiple Columns in query()
You can apply the isin()
function to multiple columns for more complex filtering.
# Sample data with an additional 'Region' column df['Region'] = ['West', 'East', 'West', 'East', 'South'] # Filter rows where 'PlanType' is 'Basic' or 'Premium' and 'Region' is 'East' or 'West' filtered_df = df.query("PlanType.isin(['Basic', 'Premium']) & Region.isin(['East', 'West'])") print(filtered_df)
Output:
CustomerID PlanType MonthlyCharge Region 0 1 Basic 20 West 1 2 Premium 50 East 2 3 Basic 20 West
Using isin() with ~ for Negation in query()
You can use the tilde symbol ~
to select rows that don’t match the values in the list.
# Filter rows where 'PlanType' is not 'Basic' or 'Premium' filtered_df = df.query("~PlanType.isin(['Basic', 'Premium'])") print(filtered_df)
Output:
CustomerID PlanType MonthlyCharge Region 3 4 Standard 30 East
Using Variables in query() and isin()
If your list of values is stored in a variable, you can reference it in query()
using an @
symbol.
allowed_plans = ['Basic', 'Premium'] allowed_regions = ['East', 'West'] filtered_df = df.query("PlanType.isin(@allowed_plans) & Region.isin(@allowed_regions)") print(filtered_df)
Output:
CustomerID PlanType MonthlyCharge Region 0 1 Basic 20 West 1 2 Premium 50 East 2 3 Basic 20 West
Combining isin with String Methods in query()
You can combine isin
with Pandas string methods to filter rows based on string conditions.
# Sample data with an additional 'Operator' column df['Operator'] = ['XCorp', 'YCorp', 'XCorp', 'ZCorp', 'YCorp'] # Filter rows where 'Operator' starts with 'X' or 'Y' and 'PlanType' is 'Basic' filtered_df = df.query("Operator.str.startswith('X') | Operator.str.startswith('Y')") print(filtered_df)
Output:
CustomerID PlanType MonthlyCharge Region Operator 0 1 Basic 20 West XCorp 1 2 Premium 50 East YCorp 2 3 Basic 20 West XCorp 4 5 Premium 50 South YCorp
Filtering Using isin() and query() from External Files
Sometimes the list you want to use for filtering is stored in an external file. You can read that into a Python list and use isin()
.
Assume you have a file allowed_plans.txt
containing:
Basic Premium
Code:
# Reading the list from an external file with open('allowed_plans.txt', 'r') as f: allowed_plans = f.read().splitlines() # Filtering using the list from the external file filtered_df = df.query("PlanType.isin(@allowed_plans)") print(filtered_df)
Output would be similar to previous examples where we filtered based on ‘Basic’ and ‘Premium’ plans.
Using isin() and query() with Dates
The isin
method is not limited to strings or numeric values; it can also be used to filter a list of datetime objects.
# Sample data with a 'StartDate' column df['StartDate'] = pd.to_datetime(['2022-01-01', '2022-02-15', '2022-01-20', '2022-03-01', '2022-04-15']) # List of allowed start dates allowed_dates = pd.to_datetime(['2022-01-01', '2022-04-15']) # Filtering based on the allowed_dates filtered_df = df.query("StartDate.isin(@allowed_dates)") print(filtered_df)
Output:
CustomerID PlanType MonthlyCharge Region Operator StartDate 0 1 Basic 20 West XCorp 2022-01-01 4 5 Premium 50 South YCorp 2022-04-15
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.