Filter Pandas DataFrame using isin with a Python List

The isin function in Pandas allows for powerful filtering of DataFrames based on a list of values.

In this tutorial, you’ll learn how to use isin with lists of strings, integers, and Series objects to filter data.

You’ll also learn dynamic techniques for generating filters, whether by extracting values from another DataFrame or applying conditional logic.

 

 

Pandas isin with Lists

The isin function can work with a variety of list types including lists of strings, integers, Pandas Series, and dictionaries.

Below, we’ll walk through examples of each to show you just how versatile this method can be.

List of Strings

Let’s consider you have a DataFrame with customer IDs and their respective phone plans.

import pandas as pd
data = {'CustomerID': [101, 102, 103, 104],
        'Plan': ['Basic', 'Premium', 'Basic', 'Elite']}
df = pd.DataFrame(data)
print(df)

Output:

   CustomerID     Plan
0         101    Basic
1         102  Premium
2         103    Basic
3         104    Elite

Now, you want to filter out customers who are on either the “Basic” or “Elite” plan.

filtered_df = df[df['Plan'].isin(['Basic', 'Elite'])]
print(filtered_df)

Output:

   CustomerID   Plan
0         101  Basic
2         103  Basic
3         104  Elite

 

List of Integers

Suppose you have specific customer IDs that you want to focus on.

filtered_df = df[df['CustomerID'].isin([101, 104])]
print(filtered_df)

Output:

   CustomerID   Plan
0         101  Basic
3         104  Elite

 

List of Series

You can also use a Pandas Series as your list for the isin method.

series_to_filter = pd.Series([101, 104])
filtered_df = df[df['CustomerID'].isin(series_to_filter)]
print(filtered_df)

Output:

   CustomerID   Plan
0         101  Basic
3         104  Elite

This produces the same output as using a list of integers.

 

Generate a List from Another DataFrame

Imagine you have another DataFrame that contains customer IDs for a promotional campaign.

You want to see if these IDs exist in your original DataFrame.

promo_df = pd.DataFrame({'Promo_CustomerID': [101, 105, 106]})
print(promo_df)

Output:

   Promo_CustomerID
0               101
1               105
2               106

You can dynamically generate a list from this DataFrame and use it for filtering.

filtered_df = df[df['CustomerID'].isin(promo_df['Promo_CustomerID'])]
print(filtered_df)

Output:

   CustomerID   Plan
0         101  Basic

Only the customer ID 101 is common between the original DataFrame and the promotional DataFrame.

 

Dynamic List Generation from Conditional Filtering

You can create a dynamic list by filtering the original DataFrame based on certain conditions.

basic_customers = df[df['Plan'] == 'Basic']
dynamic_list = basic_customers['CustomerID'].tolist()
filtered_df = df[df['CustomerID'].isin(dynamic_list)]
print(filtered_df)

Output:

   CustomerID   Plan
0         101  Basic
2         103  Basic

Here, we first find all the customers with the “Basic” plan. Then, we use their customer IDs to filter the original DataFrame.

 

Use isin with List of Lists

If you want to use a list of lists with isin, you’ll need to use a bit of extra logic since isin directly doesn’t handle a list of lists.

You can use isin with a list of lists in two ways:

  1. Flatten the list of lists: You can flatten a list of lists into a single list.
  2. Row-wise comparison: If you want to compare each row in the DataFrame to a list within the list of lists, you’ll have to loop through the list of lists and apply isin for each sublist.

Flatten List of Lists for isin

In this example, the list of lists is [[1, 2], [3, 4]], which we flatten to [1, 2, 3, 4] before using isin.

import pandas as pd
data = {'customer_id': [1, 2, 3, 4, 5],
        'package': ['Basic', 'Premium', 'Basic', 'Standard', 'Premium']}
df = pd.DataFrame(data)
list_of_lists = [[1, 2], [3, 4]]

# Flatten list of lists
flat_list = [item for sublist in list_of_lists for item in sublist]
filtered_df = df[df['customer_id'].isin(flat_list)]
print(filtered_df)

Row-wise Comparison

In this example, we loop through the list of lists [[1, 2], [3, 4]] and apply isin for each sublist.

import pandas as pd
data = {'customer_id': [1, 2, 3, 4, 5],
        'package': ['Basic', 'Premium', 'Basic', 'Standard', 'Premium']}
df = pd.DataFrame(data)
list_of_lists = [[1, 2], [3, 4]]

# Initialize an empty DataFrame to store results
result_df = pd.DataFrame()

# Loop through list of lists and filter rows
for sublist in list_of_lists:
    filtered_df = df[df['customer_id'].isin(sublist)]
    result_df = pd.concat([result_df, filtered_df])
print(result_df)

Note that if your list is nested deeper like this [[1, 2], [3, [5]]], you can use a recursive function to filter the DataFrame.

Here’s an example:

import pandas as pd
data = {'customer_id': [1, 2, 3, 4, 5],
        'package': ['Basic', 'Premium', 'Basic', 'Standard', 'Premium']}
df = pd.DataFrame(data)
list_of_lists = [[1, 2], [3, [5]]]
result_df = pd.DataFrame()

def filter_dataframe(df, sublist):
    filtered_df = df[df['customer_id'].isin(sublist)]
    return filtered_df

def recursive_filter(df, nested_list):
    temp_df = pd.DataFrame()  # Create a temporary DataFrame for each recursion
    for item in nested_list:
        if isinstance(item, list):
            filtered_df = recursive_filter(df, item)
            temp_df = pd.concat([temp_df, filtered_df])
        else:
            filtered_df = filter_dataframe(df, [item])
            temp_df = pd.concat([temp_df, filtered_df])
    return temp_df
result_df = recursive_filter(df, list_of_lists)
result_df = pd.concat([result_df])  # Concatenate the filtered DataFrames
print(result_df)

Output:

   customer_id  package
0            1    Basic
1            2  Premium
2            3    Basic
4            5  Premium
Leave a Reply

Your email address will not be published. Required fields are marked *