Handle NaN values in Pandas query Method
If you’re dealing with a dataset that contains NaN (Not a Number) values, using .query()
will yield unexpected results.
In this tutorial, you’ll learn how to handle NaN values when using the .query()
method.
Use isna() to Filter Rows with NaN Values
In Pandas, the isna()
function is used to identify NaN values in a DataFrame.
You can use isna()
directly within the .query()
method or using variables.
First, create a sample DataFrame:
import pandas as pd data = { 'customer_id': [1, 2, 3, 4, 5], 'account_balance': [200.0, 300.0, None, 500.0, None], 'last_purchase_date': ['2021-01-01', '2021-06-15', '2021-05-20', None, '2021-10-01'] } df = pd.DataFrame(data) print(df)
Output:
customer_id account_balance last_purchase_date 0 1 200.0 2021-01-01 1 2 300.0 2021-06-15 2 3 NaN 2021-05-20 3 4 500.0 None 4 5 NaN 2021-10-01
In this dataset, the account_balance
and last_purchase_date
columns contain NaN values.
To filter out rows where account_balance
is NaN, use the following code:
isna_var = pd.isna filtered_df = df.query('@isna_var(account_balance)') print(filtered_df)
Output:
customer_id account_balance last_purchase_date 2 3 NaN 2021-05-20 4 5 NaN 2021-10-01
You will get the same output if you use isna()
directly:
filtered_df = df.query('account_balance.isna()') print(filtered_df)
Using notna() to Exclude NaN Values
You can use the notna()
function to exclude NaN values from your query results.
Continue with the previous DataFrame, and to filter rows where account_balance
is not NaN, run the following code:
notna_var = pd.notna filtered_df_notna = df.query('@notna_var(account_balance)') print(filtered_df_notna)
Output:
customer_id account_balance last_purchase_date 0 1 200.0 2021-01-01 1 2 300.0 2021-06-15 3 4 500.0 None
The external variable notna_var
allows us to incorporate pd.notna()
within the .query()
method.
Combining Multiple Conditions Involving NaN Values
You can combine multiple conditions using logical operators like &
(and), |
(or), and ~
(not).
Suppose you want to find all rows where account_balance
is NaN or last_purchase_date
is not NaN. You can do this as follows:
isna_var = pd.isna notna_var = pd.notna filtered_df_multiple = df.query('@isna_var(account_balance) | @notna_var(last_purchase_date)') print(filtered_df_multiple)
Output:
customer_id account_balance last_purchase_date 0 1 200.0 2021-01-01 1 2 300.0 2021-06-15 2 3 NaN 2021-05-20 4 5 NaN 2021-10-01
Notice the use of the |
operator to combine the two conditions. Similarly, you can use &
to require both conditions to be true, and ~
to negate a condition.
Querying NaN Values in Multi-Index DataFrames
You might encounter DataFrames with NaN values when you’re dealing with hierarchical data or after using functions like .groupby()
with multiple columns.
Let’s create a multi-index DataFrame for demonstration:
arrays = [ ['A', 'A', 'B', 'B'], [1, 2, 1, 2] ] index = pd.MultiIndex.from_arrays(arrays, names=('letters', 'numbers')) data = { 'account_balance': [200.0, None, 300.0, None], 'last_purchase_date': ['2021-01-01', None, '2021-06-15', '2021-05-20'] } multi_df = pd.DataFrame(data, index=index) print(multi_df)
Output:
account_balance last_purchase_date letters numbers A 1 200.0 2021-01-01 2 NaN None B 1 300.0 2021-06-15 2 NaN 2021-05-20
Now, to query this DataFrame and find rows where account_balance
is NaN, you can use:
isna_var = pd.isna filtered_multi_df = multi_df.query('@isna_var(account_balance)') print(filtered_multi_df)
Output:
account_balance last_purchase_date letters numbers A 2 NaN None B 2 NaN 2021-05-20
As you can see, the DataFrame has been filtered to only include rows where account_balance
is NaN, regardless of the multi-index setup.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.