7 Ways To Filter JSON array in Python

In this tutorial, we’ll explore various methods to filter JSON arrays in Python.

We’ll learn how to use list comprehension, the filter() function, for loops, Pandas, NumPy, itertools, and JMESPath to filter JSON arrays.

 

 

Using List Comprehension

Imagine you want to filter out users who have exceeded a certain data usage threshold in the following array:

users_data = [
    {"user_id": 1, "plan_type": "basic", "data_usage": 300},
    {"user_id": 2, "plan_type": "premium", "data_usage": 500},
    {"user_id": 3, "plan_type": "basic", "data_usage": 100},
    {"user_id": 4, "plan_type": "premium", "data_usage": 800}
]

You can use list comprehension to filter out users with data usage greater than 400:

high_usage_users = [user for user in users_data if user["data_usage"] > 400]
print(high_usage_users)

Output:

[{'user_id': 2, 'plan_type': 'premium', 'data_usage': 500}, {'user_id': 4, 'plan_type': 'premium', 'data_usage': 800}]

The list comprehension checks each element in users_data and includes it in high_usage_users if the data_usage value exceeds 400.

 

Using the filter() Function

The filter() function in Python allows you to filter items from a list or any iterable.

Imagine you want to find all users who are on the “premium” plan.

First, define a function that checks if a user is on the premium plan:

def is_premium(user):
    return user["plan_type"] == "premium"

Now, apply this function using the filter() function:

premium_users = filter(is_premium, users_data)
print(list(premium_users))

Output:

[{'user_id': 2, 'plan_type': 'premium', 'data_usage': 500}, {'user_id': 4, 'plan_type': 'premium', 'data_usage': 800}]

The filter() function takes the is_premium function and applies it to each element in users_data.

It returns an iterator that yields only those elements for which the is_premium function returns True.

 

Using a For Loop

Suppose you want to filter out users who are on a “basic” plan but have data usage above 200.

Here’s how you can do this using a for loop:

basic_high_usage = []
threshold = 200
for user in users_data:
    if user["plan_type"] == "basic" and user["data_usage"] > threshold:
        basic_high_usage.append(user)
print(basic_high_usage)

Output:

[{'user_id': 1, 'plan_type': 'basic', 'data_usage': 300}]

The for loop iterates through each dictionary in users_data, checks if the conditions are met, and appends the dictionary to basic_high_usage if they are.

 

Using Pandas

First, you need to import Pandas and convert your JSON array into a Pandas DataFrame:

import pandas as pd
users_df = pd.DataFrame(users_data)

Now, let’s filter users who are on the “basic” plan and have data usage less than 300:

filtered_users = users_df[(users_df['plan_type'] == 'basic') & (users_df['data_usage'] < 300)].to_json(index=False, orient="records")
print(filtered_users)

Output:

[{"user_id":3,"plan_type":"basic","data_usage":100}]

After filtering, we used Pandas to_json to convert the DataFrame back to JSON.

 

Using NumPy

First, import NumPy and convert your JSON array into a NumPy array:

import numpy as np

# Converting list of dictionaries to a structured NumPy array
dtype = [('user_id', 'i4'), ('plan_type', 'U10'), ('data_usage', 'i4')]
np_users_data = np.array([tuple(user.values()) for user in users_data], dtype=dtype)

Now, let’s filter out users whose data usage is more than 250:

high_usage_np_users = np_users_data[np_users_data['data_usage'] > 250]
print(high_usage_np_users)

Output:

[(1, 'basic', 300) (2, 'premium', 500) (4, 'premium', 800)]

This output lists the users with data usage above 250.

 

Using itertools

Let’s say you want to filter the premium plan only:

from itertools import filterfalse
json_array =[
    {"customerId": 101, "planType": "basic", "dataUsage": 500},
    {"customerId": 102, "planType": "premium", "dataUsage": 1500},
    {"customerId": 103, "planType": "basic", "dataUsage": 300},
    {"customerId": 104, "planType": "premium", "dataUsage": 2000}
]
filtered_data = list(filterfalse(lambda entry: entry['planType'] != 'premium', json_array))
print(filtered_data)

Output:

[{'customerId': 102, 'planType': 'premium', 'dataUsage': 1500},
 {'customerId': 104, 'planType': 'premium', 'dataUsage': 2000}]

The filterfalse function returns an iterator that contains elements from json_array that do not satisfy the given condition.

To get a list of the filtered elements, the list() function is used to convert the iterator into a list.

 

Using JMESPath

JMESPath (JSON Matching Expressions Path) is a query language for JSON. It allows you to specify how to extract elements from a JSON structure.

Suppose you want to extract specific information, like all users with a certain plan type or those who exceed a specific data usage threshold.

First, you would need to install the JMESPath library if you haven’t already:

pip install jmespath

Now, let’s use JMESPath to filter out users with ‘premium’ plan type:

import jmespath
query = "[?plan_type == 'premium']"
premium_users = jmespath.search(query, users_data)
print(premium_users)

Output:

[{'user_id': 2, 'plan_type': 'premium', 'data_usage': 500}, {'user_id': 4, 'plan_type': 'premium', 'data_usage': 800}]

The expression "[?plan_type == 'premium']" is used to specify the filtering condition within the JSON data.

 

Filter Nested JSON Arrays

Consider the following nested JSON data:

nested_users_data = [
    {"user_id": 1, "plan_type": "basic", "devices": [{"device_id": 101, "data_usage": 150}, {"device_id": 102, "data_usage": 200}]},
    {"user_id": 2, "plan_type": "premium", "devices": [{"device_id": 201, "data_usage": 500}]},
    {"user_id": 3, "plan_type": "basic", "devices": [{"device_id": 301, "data_usage": 100}, {"device_id": 302, "data_usage": 50}]},
    {"user_id": 4, "plan_type": "premium", "devices": [{"device_id": 401, "data_usage": 300}, {"device_id": 402, "data_usage": 500}]}
]

Suppose you want to find users who have at least one device with data usage over 250:

filtered_users = []
for user in nested_users_data:
    if any(device['data_usage'] > 250 for device in user['devices']):
        filtered_users.append(user)
print(filtered_users)

Output:

[{'user_id': 2, 'plan_type': 'premium', 'devices': [{'device_id': 201, 'data_usage': 500}]},
 {'user_id': 4, 'plan_type': 'premium', 'devices': [{'device_id': 401, 'data_usage': 300}, {'device_id': 402, 'data_usage': 500}]}]

The any() function is used to check if any device in the user’s ‘devices’ list meets the specified condition.

Leave a Reply

Your email address will not be published. Required fields are marked *