Filter Pandas DataFrame using isin with a Python List
The isin
function in Pandas allows for powerful filtering of DataFrames based on a list of values.
In this tutorial, you’ll learn how to use isin
with lists of strings, integers, and Series objects to filter data.
You’ll also learn dynamic techniques for generating filters, whether by extracting values from another DataFrame or applying conditional logic.
Pandas isin with Lists
The isin
function can work with a variety of list types including lists of strings, integers, Pandas Series, and dictionaries.
Below, we’ll walk through examples of each to show you just how versatile this method can be.
List of Strings
Let’s consider you have a DataFrame with customer IDs and their respective phone plans.
import pandas as pd data = {'CustomerID': [101, 102, 103, 104], 'Plan': ['Basic', 'Premium', 'Basic', 'Elite']} df = pd.DataFrame(data) print(df)
Output:
CustomerID Plan 0 101 Basic 1 102 Premium 2 103 Basic 3 104 Elite
Now, you want to filter out customers who are on either the “Basic” or “Elite” plan.
filtered_df = df[df['Plan'].isin(['Basic', 'Elite'])] print(filtered_df)
Output:
CustomerID Plan 0 101 Basic 2 103 Basic 3 104 Elite
List of Integers
Suppose you have specific customer IDs that you want to focus on.
filtered_df = df[df['CustomerID'].isin([101, 104])] print(filtered_df)
Output:
CustomerID Plan 0 101 Basic 3 104 Elite
List of Series
You can also use a Pandas Series as your list for the isin
method.
series_to_filter = pd.Series([101, 104]) filtered_df = df[df['CustomerID'].isin(series_to_filter)] print(filtered_df)
Output:
CustomerID Plan 0 101 Basic 3 104 Elite
This produces the same output as using a list of integers.
Generate a List from Another DataFrame
Imagine you have another DataFrame that contains customer IDs for a promotional campaign.
You want to see if these IDs exist in your original DataFrame.
promo_df = pd.DataFrame({'Promo_CustomerID': [101, 105, 106]}) print(promo_df)
Output:
Promo_CustomerID 0 101 1 105 2 106
You can dynamically generate a list from this DataFrame and use it for filtering.
filtered_df = df[df['CustomerID'].isin(promo_df['Promo_CustomerID'])] print(filtered_df)
Output:
CustomerID Plan 0 101 Basic
Only the customer ID 101 is common between the original DataFrame and the promotional DataFrame.
Dynamic List Generation from Conditional Filtering
You can create a dynamic list by filtering the original DataFrame based on certain conditions.
basic_customers = df[df['Plan'] == 'Basic'] dynamic_list = basic_customers['CustomerID'].tolist() filtered_df = df[df['CustomerID'].isin(dynamic_list)] print(filtered_df)
Output:
CustomerID Plan 0 101 Basic 2 103 Basic
Here, we first find all the customers with the “Basic” plan. Then, we use their customer IDs to filter the original DataFrame.
Use isin with List of Lists
If you want to use a list of lists with isin
, you’ll need to use a bit of extra logic since isin
directly doesn’t handle a list of lists.
You can use isin
with a list of lists in two ways:
- Flatten the list of lists: You can flatten a list of lists into a single list.
- Row-wise comparison: If you want to compare each row in the DataFrame to a list within the list of lists, you’ll have to loop through the list of lists and apply
isin
for each sublist.
Flatten List of Lists for isin
In this example, the list of lists is [[1, 2], [3, 4]]
, which we flatten to [1, 2, 3, 4]
before using isin
.
import pandas as pd data = {'customer_id': [1, 2, 3, 4, 5], 'package': ['Basic', 'Premium', 'Basic', 'Standard', 'Premium']} df = pd.DataFrame(data) list_of_lists = [[1, 2], [3, 4]] # Flatten list of lists flat_list = [item for sublist in list_of_lists for item in sublist] filtered_df = df[df['customer_id'].isin(flat_list)] print(filtered_df)
Row-wise Comparison
In this example, we loop through the list of lists [[1, 2], [3, 4]]
and apply isin
for each sublist.
import pandas as pd data = {'customer_id': [1, 2, 3, 4, 5], 'package': ['Basic', 'Premium', 'Basic', 'Standard', 'Premium']} df = pd.DataFrame(data) list_of_lists = [[1, 2], [3, 4]] # Initialize an empty DataFrame to store results result_df = pd.DataFrame() # Loop through list of lists and filter rows for sublist in list_of_lists: filtered_df = df[df['customer_id'].isin(sublist)] result_df = pd.concat([result_df, filtered_df]) print(result_df)
Note that if your list is nested deeper like this [[1, 2], [3, [5]]]
, you can use a recursive function to filter the DataFrame.
Here’s an example:
import pandas as pd data = {'customer_id': [1, 2, 3, 4, 5], 'package': ['Basic', 'Premium', 'Basic', 'Standard', 'Premium']} df = pd.DataFrame(data) list_of_lists = [[1, 2], [3, [5]]] result_df = pd.DataFrame() def filter_dataframe(df, sublist): filtered_df = df[df['customer_id'].isin(sublist)] return filtered_df def recursive_filter(df, nested_list): temp_df = pd.DataFrame() # Create a temporary DataFrame for each recursion for item in nested_list: if isinstance(item, list): filtered_df = recursive_filter(df, item) temp_df = pd.concat([temp_df, filtered_df]) else: filtered_df = filter_dataframe(df, [item]) temp_df = pd.concat([temp_df, filtered_df]) return temp_df result_df = recursive_filter(df, list_of_lists) result_df = pd.concat([result_df]) # Concatenate the filtered DataFrames print(result_df)
Output:
customer_id package 0 1 Basic 1 2 Premium 2 3 Basic 4 5 Premium
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.