7 Ways To Filter JSON array in Python
In this tutorial, we’ll explore various methods to filter JSON arrays in Python.
We’ll learn how to use list comprehension, the filter()
function, for loops, Pandas, NumPy, itertools
, and JMESPath to filter JSON arrays.
Using List Comprehension
Imagine you want to filter out users who have exceeded a certain data usage threshold in the following array:
users_data = [ {"user_id": 1, "plan_type": "basic", "data_usage": 300}, {"user_id": 2, "plan_type": "premium", "data_usage": 500}, {"user_id": 3, "plan_type": "basic", "data_usage": 100}, {"user_id": 4, "plan_type": "premium", "data_usage": 800} ]
You can use list comprehension to filter out users with data usage greater than 400:
high_usage_users = [user for user in users_data if user["data_usage"] > 400] print(high_usage_users)
Output:
[{'user_id': 2, 'plan_type': 'premium', 'data_usage': 500}, {'user_id': 4, 'plan_type': 'premium', 'data_usage': 800}]
The list comprehension checks each element in users_data
and includes it in high_usage_users
if the data_usage
value exceeds 400.
Using the filter() Function
The filter()
function in Python allows you to filter items from a list or any iterable.
Imagine you want to find all users who are on the “premium” plan.
First, define a function that checks if a user is on the premium plan:
def is_premium(user): return user["plan_type"] == "premium"
Now, apply this function using the filter()
function:
premium_users = filter(is_premium, users_data) print(list(premium_users))
Output:
[{'user_id': 2, 'plan_type': 'premium', 'data_usage': 500}, {'user_id': 4, 'plan_type': 'premium', 'data_usage': 800}]
The filter()
function takes the is_premium
function and applies it to each element in users_data
.
It returns an iterator that yields only those elements for which the is_premium
function returns True
.
Using a For Loop
Suppose you want to filter out users who are on a “basic” plan but have data usage above 200.
Here’s how you can do this using a for loop:
basic_high_usage = [] threshold = 200 for user in users_data: if user["plan_type"] == "basic" and user["data_usage"] > threshold: basic_high_usage.append(user) print(basic_high_usage)
Output:
[{'user_id': 1, 'plan_type': 'basic', 'data_usage': 300}]
The for loop iterates through each dictionary in users_data
, checks if the conditions are met, and appends the dictionary to basic_high_usage
if they are.
Using Pandas
First, you need to import Pandas and convert your JSON array into a Pandas DataFrame:
import pandas as pd users_df = pd.DataFrame(users_data)
Now, let’s filter users who are on the “basic” plan and have data usage less than 300:
filtered_users = users_df[(users_df['plan_type'] == 'basic') & (users_df['data_usage'] < 300)].to_json(index=False, orient="records") print(filtered_users)
Output:
[{"user_id":3,"plan_type":"basic","data_usage":100}]
After filtering, we used Pandas to_json
to convert the DataFrame back to JSON.
Using NumPy
First, import NumPy and convert your JSON array into a NumPy array:
import numpy as np # Converting list of dictionaries to a structured NumPy array dtype = [('user_id', 'i4'), ('plan_type', 'U10'), ('data_usage', 'i4')] np_users_data = np.array([tuple(user.values()) for user in users_data], dtype=dtype)
Now, let’s filter out users whose data usage is more than 250:
high_usage_np_users = np_users_data[np_users_data['data_usage'] > 250] print(high_usage_np_users)
Output:
[(1, 'basic', 300) (2, 'premium', 500) (4, 'premium', 800)]
This output lists the users with data usage above 250.
Using itertools
Let’s say you want to filter the premium plan only:
from itertools import filterfalse json_array =[ {"customerId": 101, "planType": "basic", "dataUsage": 500}, {"customerId": 102, "planType": "premium", "dataUsage": 1500}, {"customerId": 103, "planType": "basic", "dataUsage": 300}, {"customerId": 104, "planType": "premium", "dataUsage": 2000} ] filtered_data = list(filterfalse(lambda entry: entry['planType'] != 'premium', json_array)) print(filtered_data)
Output:
[{'customerId': 102, 'planType': 'premium', 'dataUsage': 1500}, {'customerId': 104, 'planType': 'premium', 'dataUsage': 2000}]
The filterfalse
function returns an iterator that contains elements from json_array
that do not satisfy the given condition.
To get a list of the filtered elements, the list()
function is used to convert the iterator into a list.
Using JMESPath
JMESPath (JSON Matching Expressions Path) is a query language for JSON. It allows you to specify how to extract elements from a JSON structure.
Suppose you want to extract specific information, like all users with a certain plan type or those who exceed a specific data usage threshold.
First, you would need to install the JMESPath library if you haven’t already:
pip install jmespath
Now, let’s use JMESPath to filter out users with ‘premium’ plan type:
import jmespath query = "[?plan_type == 'premium']" premium_users = jmespath.search(query, users_data) print(premium_users)
Output:
[{'user_id': 2, 'plan_type': 'premium', 'data_usage': 500}, {'user_id': 4, 'plan_type': 'premium', 'data_usage': 800}]
The expression "[?plan_type == 'premium']"
is used to specify the filtering condition within the JSON data.
Filter Nested JSON Arrays
Consider the following nested JSON data:
nested_users_data = [ {"user_id": 1, "plan_type": "basic", "devices": [{"device_id": 101, "data_usage": 150}, {"device_id": 102, "data_usage": 200}]}, {"user_id": 2, "plan_type": "premium", "devices": [{"device_id": 201, "data_usage": 500}]}, {"user_id": 3, "plan_type": "basic", "devices": [{"device_id": 301, "data_usage": 100}, {"device_id": 302, "data_usage": 50}]}, {"user_id": 4, "plan_type": "premium", "devices": [{"device_id": 401, "data_usage": 300}, {"device_id": 402, "data_usage": 500}]} ]
Suppose you want to find users who have at least one device with data usage over 250:
filtered_users = [] for user in nested_users_data: if any(device['data_usage'] > 250 for device in user['devices']): filtered_users.append(user) print(filtered_users)
Output:
[{'user_id': 2, 'plan_type': 'premium', 'devices': [{'device_id': 201, 'data_usage': 500}]}, {'user_id': 4, 'plan_type': 'premium', 'devices': [{'device_id': 401, 'data_usage': 300}, {'device_id': 402, 'data_usage': 500}]}]
The any()
function is used to check if any device in the user’s ‘devices’ list meets the specified condition.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.