Ignore Case Sensitivity in Pandas query Method
When working with Pandas DataFrames, you may need to filter rows based on text columns that contain the same values but with inconsistent capitalization.
Using Pandas query
method for filtering rows becomes challenging in these situations.
In this tutorial, you will learn how to perform case-insensitive queries on a Pandas DataFrame in the query
method using str.lower()
and str.upper()
.
Additional methods covered include using str.contains()
and str.match()
to search for substrings and patterns while ignoring case sensitivity.
Using the str.lower() or str.upper()
To perform a case-insensitive search using the query
method in Pandas, one approach is to use the str.lower()
or str.upper()
functions to convert all strings to a common case—either all lower case or all upper case.
Then you can compare them easily. First, let’s create a sample DataFrame.
import pandas as pd data = {'CustomerID': [1, 2, 3, 4], 'Name': ['John Doe', 'Jane Smith', 'Emily Clark', 'Mike Brown'], 'City': ['New York', 'San Francisco', 'new york', 'NEW YORK']} df = pd.DataFrame(data) print(df)
Output:
CustomerID Name City 0 1 John Doe New York 1 2 Jane Smith San Francisco 2 3 Emily Clark new york 3 4 Mike Brown NEW YORK
You can see that the ‘City’ column has the same city (‘New York’) in different cases.
Now, let’s perform a case-insensitive query using str.lower()
.
# Case-insensitive query using str.lower() result = df.query("City.str.lower() == 'new york'") print(result)
Output:
CustomerID Name City 0 1 John Doe New York 2 3 Emily Clark new york 3 4 Mike Brown NEW YORK
All the rows with the city ‘New York’, regardless of the case, are included in the result.
Using str.contains()
In some situations, you may want to filter rows based on whether a particular string exists within a column’s values, ignoring case sensitivity.
You can combine str.contains()
with the case
parameter set to False
for a case-insensitive search.
Now, let’s say you’re interested in filtering all rows where the city contains the word ‘York’, irrespective of the case.
result_str_contains = df.query("City.str.contains('york', case=False)") print(result_str_contains)
Output:
CustomerID Name City 0 1 John Doe New York 2 3 Emily Clark new york 3 4 Mike Brown NEW YORK
As you can see, the output includes all rows where the city contains the word ‘York’, regardless of how it’s capitalized.
Using str.match()
The str.match()
method can also be applied within the query
function for pattern matching, allowing for regular expression searches.
When combined with the case
parameter set to False
, it enables you to perform case-insensitive regular expression matching.
Suppose you want to find all the rows where the city starts with the word ‘New’, ignoring case.
You can do this by using the str.match()
method and specifying a regular expression pattern:
# Using str.match() for a case-insensitive search result_str_match = df.query("City.str.match('new', case=False)") print(result_str_match)
Output:
CustomerID Name City 0 1 John Doe New York 2 3 Emily Clark new york 3 4 Mike Brown NEW YORK
Here, all the rows where the city starts with the word ‘New’ are included, regardless of the case.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.