How To Group JSON data by key in Python

In this tutorial, you’ll learn different methods to group JSON data by key in Python.

From using basic Python constructs like dictionary comprehension to employing powerful libraries like Pandas and itertools, each method has its unique advantages and use cases.

 

 

Using Dictionary Comprehension

You can use dictionary comprehension to group JSON data by key:

import json
json_data = '''
[
    {"user_id": "001", "data_used": 5.2, "location": "CityA"},
    {"user_id": "002", "data_used": 7.8, "location": "CityB"},
    {"user_id": "003", "data_used": 3.5, "location": "CityA"},
    {"user_id": "004", "data_used": 2.4, "location": "CityC"}
]
'''
data = json.loads(json_data)
grouped_data = {location: [user for user in data if user['location'] == location] 
                for location in set(user['location'] for user in data)}
print(grouped_data)

Output:

{
    "CityC": [{"user_id": "004", "data_used": 2.4, "location": "CityC"}],
    "CityA": [
        {"user_id": "001", "data_used": 5.2, "location": "CityA"},
        {"user_id": "003", "data_used": 3.5, "location": "CityA"}
    ],
    "CityB": [{"user_id": "002", "data_used": 7.8, "location": "CityB"}]
}

This code first converts the JSON data into a Python list of dictionaries.

Then, it uses dictionary comprehension to create a new dictionary where each key is a unique location, and the value is a list of users in that location.

Note that each run you’ll get a different order of the output.

 

Using groupby from itertools

The groupby function from Python’s itertools module is another excellent tool for grouping data. It’s particularly useful when dealing with sorted data.

You can group user records by location using groupby.

Here’s how to use groupby:

import json
from itertools import groupby
from operator import itemgetter
json_data = '''
[
    {"user_id": "001", "data_used": 5.2, "location": "CityA"},
    {"user_id": "002", "data_used": 7.8, "location": "CityB"},
    {"user_id": "003", "data_used": 3.5, "location": "CityA"},
    {"user_id": "004", "data_used": 2.4, "location": "CityC"}
]
'''
data = json.loads(json_data)
data.sort(key=itemgetter('location'))
grouped_data = {k: list(g) for k, g in groupby(data, key=itemgetter('location'))}
print(grouped_data)

Output:

{
    "CityA": [
        {"user_id": "001", "data_used": 5.2, "location": "CityA"},
        {"user_id": "003", "data_used": 3.5, "location": "CityA"}
    ],
    "CityB": [{"user_id": "002", "data_used": 7.8, "location": "CityB"}],
    "CityC": [{"user_id": "004", "data_used": 2.4, "location": "CityC"}]
}

This code sorts the data by location first, as groupby requires sorted data to work correctly.

Then, it uses groupby along with a dictionary comprehension to create a dictionary where each key is a location and the value is a list of users from that location.

 

Using Pandas groupby

Here’s an example of how to use Pandas groupby to group JSON data:

import pandas as pd
import json
json_data = '''
[
    {"user_id": "001", "data_used": 5.2, "location": "CityA"},
    {"user_id": "002", "data_used": 7.8, "location": "CityB"},
    {"user_id": "003", "data_used": 3.5, "location": "CityA"},
    {"user_id": "004", "data_used": 2.4, "location": "CityC"}
]
'''
data = pd.read_json(json_data)
grouped_data = data.groupby('location').apply(lambda x: x.to_dict('records')).to_json(orient="index", indent=True)
print(grouped_data)

Output:

{
 "CityA":[
  {
   "user_id":1,
   "data_used":5.2,
   "location":"CityA"
  },
  {
   "user_id":3,
   "data_used":3.5,
   "location":"CityA"
  }
 ],
 "CityB":[
  {
   "user_id":2,
   "data_used":7.8,
   "location":"CityB"
  }
 ],
 "CityC":[
  {
   "user_id":4,
   "data_used":2.4,
   "location":"CityC"
  }
 ]
}

In this snippet, the JSON data is first converted into a Pandas DataFrame.

Then, the groupby method groups the data by the ‘location’ column. The lambda function within apply converts each group into a list of dictionaries.

Finally, the to_json() method converts the data back to JSON.

 

Using Custom Functions

Writing a custom function is useful when you have unique criteria for grouping or need to perform additional processing on the data.

Let’s create a custom function to group our dataset:

import json
json_data = '''
[
    {"user_id": "001", "data_used": 5.2, "location": "CityA"},
    {"user_id": "002", "data_used": 7.8, "location": "CityB"},
    {"user_id": "003", "data_used": 3.5, "location": "CityA"},
    {"user_id": "004", "data_used": 2.4, "location": "CityC"}
]
'''
data = json.loads(json_data)
def group_data(data, key):
    grouped_data = {}
    for item in data:
        group_key = item[key]
        if group_key not in grouped_data:
            grouped_data[group_key] = []
        grouped_data[group_key].append(item)
    return grouped_data
grouped_data = group_data(data, 'location')
print(grouped_data)

Output:

{
    "CityA": [
        {"user_id": "001", "data_used": 5.2, "location": "CityA"},
        {"user_id": "003", "data_used": 3.5, "location": "CityA"}
    ],
    "CityB": [{"user_id": "002", "data_used": 7.8, "location": "CityB"}],
    "CityC": [{"user_id": "004", "data_used": 2.4, "location": "CityC"}]
}

This function, group_data, takes the dataset and a key to group by as arguments.

It iterates over the dataset, grouping items into a dictionary based on the specified key.

 

Using Collections Module (defaultdict)

The collections module in Python offers defaultdict that provides a default value for a key that does not exist.

This feature is useful for grouping data, as it simplifies the code by eliminating the need to check if a key already exists in the dictionary.

Here’s how you can use defaultdict to group our data:

import json
from collections import defaultdict
json_data = '''
[
    {"user_id": "001", "data_used": 5.2, "location": "CityA"},
    {"user_id": "002", "data_used": 7.8, "location": "CityB"},
    {"user_id": "003", "data_used": 3.5, "location": "CityA"},
    {"user_id": "004", "data_used": 2.4, "location": "CityC"}
]
'''
data = json.loads(json_data)
grouped_data = defaultdict(list)
for item in data:
    grouped_data[item['location']].append(item)
print(dict(grouped_data))

Output:

{
    "CityA": [
        {"user_id": "001", "data_used": 5.2, "location": "CityA"},
        {"user_id": "003", "data_used": 3.5, "location": "CityA"}
    ],
    "CityB": [{"user_id": "002", "data_used": 7.8, "location": "CityB"}],
    "CityC": [{"user_id": "004", "data_used": 2.4, "location": "CityC"}]
}

In this example, defaultdict is used to automatically create a new list for each new key.

Leave a Reply

Your email address will not be published. Required fields are marked *