Filter Null and not Null Values in Pandas query method
The query
method allows you filter DataFrame rows based on a query expression.
In this tutorial, you will learn how to check for null and not null values in the Pandas query
method in Python.
Check for Null Values
Firstly, let’s create a sample DataFrame with some data, including null values.
import pandas as pd data = {'ID': [1, 2, 3, 4, 5], 'Metric1': [32, 45, None, 21, 56], 'Metric2': [None, 25, 43, 54, None]} df = pd.DataFrame(data) print(df)
Output:
ID Metric1 Metric2 0 1 32.0 NaN 1 2 45.0 25.0 2 3 NaN 43.0 3 4 21.0 54.0 4 5 56.0 NaN
The DataFrame contains two columns, Metric1
and Metric2
, with some null values represented as NaN
(Not a Number).
Now, let’s count the number of null values in the Metric1
column using isnull():
# Count null values in the Metric1 column null_count = df['Metric1'].isnull().sum() print("Number of null values in Metric1:", null_count)
Output:
Number of null values in Metric1: 1
Check for Not Null Values
You can use the notnull()
function to find the number of not null values in a DataFrame column.
# Count not null values in the Metric1 column not_null_count = df['Metric1'].notnull().sum() print("Number of not null values in Metric1:", not_null_count)
Output:
Number of not null values in Metric1: 4
Filter Rows Based on Null Values
The query
method can filter out null values from specific rows:
filtered_df = df.query("Metric1.isnull()") print(filtered_df)
Output:
ID Metric1 Metric2 2 3 NaN 43.0
The filtered DataFrame displays only the row where Metric1
is null.
Filter Rows Based on Not Null Values
Just as you can filter rows based on null values, you can also do the opposite—filter rows where the values are not null.
Here’s how to do that using the query
method.
filtered_not_null_df = df.query("Metric1.notnull()") print(filtered_not_null_df)
Output:
ID Metric1 Metric2 0 1 32.0 NaN 1 2 45.0 25.0 3 4 21.0 54.0 4 5 56.0 NaN
The DataFrame now only includes rows where Metric1
is not null.
Filter Based on Null Values Across Multiple Columns
In many real-world scenarios, you may need to filter rows based on null values across multiple columns.
Let’s see how to filter rows based on null values in either the Metric1
or Metric2
columns.
# Filter rows where either Metric1 or Metric2 is null filtered_multiple_cols_df = df.query("Metric1.isnull() | Metric2.isnull()") print(filtered_multiple_cols_df)
Output:
ID Metric1 Metric2 0 1 32.0 NaN 2 3 NaN 43.0 4 5 56.0 NaN
The use of the pipe symbol (|
) serves as a logical OR, enabling the filter to catch rows where either of the columns is null.
Combining isnull() and notnull()
The query
method offers a flexible way to combine isnull()
and notnull()
conditions.
Let’s dive into an example.
# Filter rows where Metric1 is not null and Metric2 is null combined_query_df = df.query("Metric1.notnull() & Metric2.isnull()") print(combined_query_df)
Output:
ID Metric1 Metric2 0 1 32.0 NaN 4 5 56.0 NaN
The DataFrame now displays rows where Metric1
is not null and Metric2
is null.
The ampersand symbol (&
) is used for logical AND, meaning both conditions must be met for a row to be included in the output.
Filtering on Null Values with Other Conditions
Let’s illustrate this with an example where you want to filter rows where Metric1
is greater than 30 and is not null.
# Filter rows where Metric1 is greater than 30 and not null conditional_query_df = df.query("Metric1 > 30 & Metric1.notnull()") print(conditional_query_df)
Output:
ID Metric1 Metric2 0 1 32.0 NaN 1 2 45.0 25.0 4 5 56.0 NaN
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.
Thanks for the tutorial.
When I try the suggested code for filtering rows which contain a null value:
df.query(“metric1.isnull()”)
I get the following error:
TypeError: ‘Series’ objects are mutable, thus they cannot be hashed
I can use bracket notation successfully:
df.loc[df[‘Metric1’].isnull()]
But I would like to understand the problem and why the .query() method isn’t working for me.
You’re welcome!
I tested both snippets and they return the same result without any issues.
In the line you wrote, you used “metric1” and it should be “Metric1” Capital “M”.
Regards,