Filter Pandas DataFrames Based on String Length in query()
In this tutorial, you’ll learn how to use the query
method in Pandas to filter DataFrames based on the exact string length, how to set conditions for minimum, maximum, and a specific range of string lengths.
You’ll learn how to use string length as a criterion to filter your DataFrame.
Using Exact String Length
Let’s say you want to find all records where the identifier has an exact length of 10 characters.
You can use the .str.len()
method within the query
statement to specify the string length:
import pandas as pd data = { 'ID': ['1234567890', '987654321', '12345', 'ABCDE12345', '1234ABCD5678'], 'Plan_Type': ['Basic', 'Premium', 'Basic', 'Premium', 'Basic'], 'Monthly_Charge': [30, 50, 20, 45, 35] } df = pd.DataFrame(data) filtered_df = df.query("ID.str.len() == 10") print(filtered_df)
Output:
ID Plan_Type Monthly_Charge 0 1234567890 Basic 30 3 ABCDE12345 Premium 45
Here, we filter the DataFrame to include only those rows where the ‘ID’ field is exactly 10 characters long.
Using Minimum String Length
Suppose you’re looking to filter out identifiers that are at least 8 characters long:
# Filter DataFrame for rows where 'ID' has 8 or more characters filtered_df_min_length = df.query("ID.str.len() >= 8") print(filtered_df_min_length)
Output:
ID Plan_Type Monthly_Charge 0 1234567890 Basic 30 1 987654321 Premium 50 3 ABCDE12345 Premium 45 4 1234ABCD5678 Basic 35
With ID.str.len() >= 8
inside the query
method, the DataFrame is filtered to display only the rows where the ‘ID’ column’s strings are at least 8 characters in length.
Using Maximum String Length
In other cases, you might want to focus on strings that do not exceed a certain length.
For example, if you are interested in identifying all user IDs that are shorter than 10 characters:
# Filter DataFrame for rows where 'ID' is less than 10 characters filtered_df_max_length = df.query("ID.str.len() < 10") print(filtered_df_max_length)
Output:
ID Plan_Type Monthly_Charge 1 987654321 Premium 50 2 12345 Basic 20
In this code snippet, ID.str.len() < 10
is the condition used within the query
method to filter the DataFrame.
The resulting output includes only those records with ‘ID’ strings shorter than 10 characters.
Using String Length Range
There may be instances where you are interested in a range of string lengths.
Suppose you want to identify records where the string length of identifiers is between 8 and 10 characters.
# Filter DataFrame for rows where 'ID' length is between 8 and 10 characters filtered_df_range_length = df.query("ID.str.len() >= 8 and ID.str.len() <= 10") print(filtered_df_range_length)
Output:
ID Plan_Type Monthly_Charge 0 1234567890 Basic 30 1 987654321 Premium 50 3 ABCDE12345 Premium 45
By combining two conditions with the logical and
operator within the query
statement, you can easily filter the DataFrame for rows where the ‘ID’ column has string lengths within the specified range, in this case, 8 to 10 characters inclusive.
Using String Length using a variable
There might be times when you’d prefer not to hard-code the string length values directly into the query
expression. Instead, you can use a variable within your query.
This is especially beneficial when you have dynamic criteria that may change over time or when you are working with user input.
Let’s take a look at how to implement this approach:
desired_length = 10 # Filter DataFrame using a variable in the query method filtered_df_variable_length = df.query("ID.str.len() == @desired_length") print(filtered_df_variable_length)
Output:
ID Plan_Type Monthly_Charge 0 1234567890 Basic 30 3 ABCDE12345 Premium 45
In the code above, @desired_length
is used within the query
method to reference the desired_length
variable defined outside the query.
The DataFrame is filtered to include only those records where the ‘ID’ field matches the string length specified by desired_length
.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.