Pandas NOT IN filter (Opposite of isin): Finding Mismatches
The Pandas isin
method allows you to match DataFrame or Series data against a set of values.
However, in some cases, you want to retrieve records that don’t match any values in a given list.
That’s where the negation of isin
comes into play.
In this tutorial, you’ll walk through various techniques to achieve this functionality.
Example of NOT IN filter
You can use the bitwise NOT operator ~
in conjunction with df['column'].isin([values])
First, let’s create a sample DataFrame:
import pandas as pd df = pd.DataFrame({ 'CustomerID': [1, 2, 3, 4, 5], 'Plan': ['Basic', 'Premium', 'Basic', 'Enterprise', 'Premium'], 'MonthlyCharge': [20, 50, 20, 100, 50] }) print(df)
Output:
CustomerID Plan MonthlyCharge 0 1 Basic 20 1 2 Premium 50 2 3 Basic 20 3 4 Enterprise 100 4 5 Premium 50
To filter out the rows where the plan is neither ‘Basic’ nor ‘Premium’, you can use the following code:
filtered_df = df[~df['Plan'].isin(['Basic', 'Premium'])] print(filtered_df)
Output:
CustomerID Plan MonthlyCharge 3 4 Enterprise 100
Notice the use of the bitwise NOT operator ~
. It inverts the boolean values returned by df['Plan'].isin(['Basic', 'Premium'])
, giving us the rows that do not match the condition.
Using NOT IN with Numerical Columns
Suppose you want to filter out the rows where MonthlyCharge
is not 20 or 50. You can accomplish this with the following code:
filtered_numerical_df = df[~df['MonthlyCharge'].isin([20, 50])] print(filtered_numerical_df)
Output:
CustomerID Plan MonthlyCharge 3 4 Enterprise 100
The bitwise NOT operator ~
functions the same way with numerical data, inverting the boolean values and thus displaying the rows that don’t match the criteria.
Using NOT IN with DateTime Columns
First, let’s add a DateTime column to our DataFrame to represent when each customer joined:
df['JoinDate'] = pd.to_datetime(['2022-01-01', '2022-02-15', '2022-01-20', '2022-03-10', '2022-02-21']) print(df)
Output:
CustomerID Plan MonthlyCharge JoinDate 0 1 Basic 20 2022-01-01 1 2 Premium 50 2022-02-15 2 3 Basic 20 2022-01-20 3 4 Enterprise 100 2022-03-10 4 5 Premium 50 2022-02-21
Now, let’s say you want to filter out customers who did not join on ‘2022-01-01’ or ‘2022-02-15’:
filtered_date_df = df[~df['JoinDate'].isin([pd.Timestamp('2022-01-01'), pd.Timestamp('2022-02-15')])] print(filtered_date_df)
Output:
CustomerID Plan MonthlyCharge JoinDate 2 3 Basic 20 2022-01-20 3 4 Enterprise 100 2022-03-10 4 5 Premium 50 2022-02-21
Here, the bitwise NOT operator ~
filters out the rows where the JoinDate
is not in the given list.
Applying NOT IN Filter on Multiple Columns
There may be instances where you’ll want to apply the negation of isin
across multiple columns.
In such cases, you can create a boolean mask by combining multiple conditions using logical operators like &
(and) or |
(or).
Let’s filter out rows where Plan
is not ‘Basic’ and MonthlyCharge
is not 50:
mask = (~df['Plan'].isin(['Basic'])) & (~df['MonthlyCharge'].isin([50])) filtered_multi_col_df = df[mask] print(filtered_multi_col_df)
Output:
CustomerID Plan MonthlyCharge JoinDate 3 4 Enterprise 100 2022-03-10
We created a boolean mask using two negated isin
conditions and used it to filter out the rows.
Combining NOT IN with Other Filters
In complex data manipulation tasks, you often need to combine the negation of isin
with other kinds of filters.
You can do this by using logical operators like &
(and) or |
(or).
Using & and | operators for Combined Filtering
Let’s start by combining multiple conditions using the &
and |
operators.
Let’s filter out rows where the plan is not ‘Basic’ and MonthlyCharge
is less than 50:
combined_mask = (~df['Plan'].isin(['Basic'])) & (df['MonthlyCharge'] < 100) filtered_combined_df = df[combined_mask] print(filtered_combined_df)
Output:
CustomerID Plan MonthlyCharge 1 2 Premium 50 4 5 Premium 50
How NOT ISIN Interacts with NaN
Let’s introduce some NaN
values into our DataFrame:
import numpy as np df['Plan'].iloc[0] = np.nan print(df)
Output:
CustomerID Plan MonthlyCharge JoinDate 0 1 NaN 20 2022-01-01 1 2 Premium 50 2022-02-15 2 3 Basic 20 2022-01-20 3 4 Enterprise 100 2022-03-10 4 5 Premium 50 2022-02-21
Now, let’s filter out rows where the plan is neither ‘Basic’ nor ‘Premium’:
filtered_nan_df = df[~df['Plan'].isin(['Basic', 'Premium'])] print(filtered_nan_df)
Output:
CustomerID Plan MonthlyCharge JoinDate 0 1 NaN 20 2022-01-01 3 4 Enterprise 100 2022-03-10
The NaN
value in the ‘Plan’ column is included in the output. This happens because NaN
is not equal to any value, including itself, thus satisfying the NOT IN condition.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.