Filtering Multi-Indexed DataFrames Using Pandas isin
One feature that proves extremely useful is the isin()
method, especially when dealing with MultiIndex DataFrames.
This tutorial will walk you through various examples to show how you can use the isin()
method with MultiIndex DataFrames.
Applying isin on MultiIndex DataFrame: Row-wise
In this section, you’ll learn how to apply the isin()
method on a MultiIndex DataFrame, focusing on row-wise filtering.
The goal is to filter rows based on multiple conditions.
First, let’s create a sample MultiIndex DataFrame:
import pandas as pd data = { 'AccountID': [101, 101, 102, 103, 103, 104], 'Timestamp': ['2021-01-01', '2021-01-02', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-01'], 'Data_Usage': [1.2, 3.1, 2.4, 2.9, 3.5, 1.8], 'Call_Duration': [5, 15, 7, 9, 12, 8] } df = pd.DataFrame(data) # Set AccountID and Timestamp as MultiIndex df.set_index(['AccountID', 'Timestamp'], inplace=True) print(df)
Output:
Data_Usage Call_Duration AccountID Timestamp 101 2021-01-01 1.2 5 2021-01-02 3.1 15 102 2021-01-01 2.4 7 103 2021-01-01 2.9 9 2021-01-02 3.5 12 104 2021-01-01 1.8 8
Now, let’s say you want to filter this DataFrame to include only the records for account IDs 101 and 103 and timestamps ‘2021-01-01’ and ‘2021-01-02’.
You can do this using the isin()
method.
filtered_df = df[df.index.isin([(101, '2021-01-01'), (103, '2021-01-02')])] print(filtered_df)
Output:
Data_Usage Call_Duration AccountID Timestamp 101 2021-01-01 1.2 5 103 2021-01-02 3.5 12
Applying isin on MultiIndex DataFrame: Column-wise
First, let’s extend the previous DataFrame to include another level of columns.
data = { ('Data_Usage', '4G'): [1.2, 3.1, 2.4, 2.9, 3.5, 1.8], ('Data_Usage', '5G'): [0.5, 0.2, 0.7, 0.4, 0.8, 0.3], ('Call_Duration', 'Local'): [5, 15, 7, 9, 12, 8], ('Call_Duration', 'International'): [0, 2, 1, 3, 4, 1] } index = pd.MultiIndex.from_tuples( [(101, '2021-01-01'), (101, '2021-01-02'), (102, '2021-01-01'), (103, '2021-01-01'), (103, '2021-01-02'), (104, '2021-01-01')], names=['AccountID', 'Timestamp'] ) df = pd.DataFrame(data, index=index) print(df)
Output:
Data_Usage Call_Duration 4G 5G Local International AccountID Timestamp 101 2021-01-01 1.2 0.5 5 0 2021-01-02 3.1 0.2 15 2 102 2021-01-01 2.4 0.7 7 1 103 2021-01-01 2.9 0.4 9 3 2021-01-02 3.5 0.8 12 4 104 2021-01-01 1.8 0.3 8 1
This MultiIndex DataFrame has two levels for columns: one for the type of data (Data_Usage
and Call_Duration
) and another for the technology used (4G
, 5G
) or call type (Local
, International
).
Using isin to filter specific columns
Suppose you’re interested in only the columns related to Data_Usage
for 4G and Call_Duration
for Local calls. Here’s how to do it:
filtered_df = df.loc[:, df.columns.isin([('Data_Usage', '4G'), ('Call_Duration', 'Local')])] print(filtered_df)
Output:
Data_Usage Call_Duration 4G Local AccountID Timestamp 101 2021-01-01 1.2 5 2021-01-02 3.1 15 102 2021-01-01 2.4 7 103 2021-01-01 2.9 9 2021-01-02 3.5 12 104 2021-01-01 1.8 8
Applying isin on MultiIndex Series
Applying the isin()
method on a MultiIndex Series is similar to applying it on a MultiIndex DataFrame.
First, we’ll extract a MultiIndex Series from our sample DataFrame:
data_usage_4g_series = df[('Data_Usage', '4G')] print(data_usage_4g_series)
Output:
AccountID Timestamp 101 2021-01-01 1.2 2021-01-02 3.1 102 2021-01-01 2.4 103 2021-01-01 2.9 2021-01-02 3.5 104 2021-01-01 1.8 Name: (Data_Usage, 4G), dtype: float64
If you want to filter this Series to include only the data usage for account IDs 101 and 103, you can use the isin()
method like this:
filtered_series = data_usage_4g_series[data_usage_4g_series.index.isin([(101, '2021-01-01'), (103, '2021-01-02')])] print(filtered_series)
Output:
AccountID Timestamp 101 2021-01-01 1.2 103 2021-01-02 3.5 Name: (Data_Usage, 4G), dtype: float64
After applying the isin()
method, the filtered Series now includes only the records for account IDs 101 and 103 for the specified timestamps.
This shows that the isin()
method can filter MultiIndex Series, just like it is for MultiIndex DataFrames.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.