How To Group JSON data by key in Python
In this tutorial, you’ll learn different methods to group JSON data by key in Python.
From using basic Python constructs like dictionary comprehension to employing powerful libraries like Pandas and itertools, each method has its unique advantages and use cases.
Using Dictionary Comprehension
You can use dictionary comprehension to group JSON data by key:
import json json_data = ''' [ {"user_id": "001", "data_used": 5.2, "location": "CityA"}, {"user_id": "002", "data_used": 7.8, "location": "CityB"}, {"user_id": "003", "data_used": 3.5, "location": "CityA"}, {"user_id": "004", "data_used": 2.4, "location": "CityC"} ] ''' data = json.loads(json_data) grouped_data = {location: [user for user in data if user['location'] == location] for location in set(user['location'] for user in data)} print(grouped_data)
Output:
{ "CityC": [{"user_id": "004", "data_used": 2.4, "location": "CityC"}], "CityA": [ {"user_id": "001", "data_used": 5.2, "location": "CityA"}, {"user_id": "003", "data_used": 3.5, "location": "CityA"} ], "CityB": [{"user_id": "002", "data_used": 7.8, "location": "CityB"}] }
This code first converts the JSON data into a Python list of dictionaries.
Then, it uses dictionary comprehension to create a new dictionary where each key is a unique location, and the value is a list of users in that location.
Note that each run you’ll get a different order of the output.
Using groupby from itertools
The groupby
function from Python’s itertools
module is another excellent tool for grouping data. It’s particularly useful when dealing with sorted data.
You can group user records by location using groupby
.
Here’s how to use groupby
:
import json from itertools import groupby from operator import itemgetter json_data = ''' [ {"user_id": "001", "data_used": 5.2, "location": "CityA"}, {"user_id": "002", "data_used": 7.8, "location": "CityB"}, {"user_id": "003", "data_used": 3.5, "location": "CityA"}, {"user_id": "004", "data_used": 2.4, "location": "CityC"} ] ''' data = json.loads(json_data) data.sort(key=itemgetter('location')) grouped_data = {k: list(g) for k, g in groupby(data, key=itemgetter('location'))} print(grouped_data)
Output:
{ "CityA": [ {"user_id": "001", "data_used": 5.2, "location": "CityA"}, {"user_id": "003", "data_used": 3.5, "location": "CityA"} ], "CityB": [{"user_id": "002", "data_used": 7.8, "location": "CityB"}], "CityC": [{"user_id": "004", "data_used": 2.4, "location": "CityC"}] }
This code sorts the data by location first, as groupby
requires sorted data to work correctly.
Then, it uses groupby
along with a dictionary comprehension to create a dictionary where each key is a location and the value is a list of users from that location.
Using Pandas groupby
Here’s an example of how to use Pandas groupby
to group JSON data:
import pandas as pd import json json_data = ''' [ {"user_id": "001", "data_used": 5.2, "location": "CityA"}, {"user_id": "002", "data_used": 7.8, "location": "CityB"}, {"user_id": "003", "data_used": 3.5, "location": "CityA"}, {"user_id": "004", "data_used": 2.4, "location": "CityC"} ] ''' data = pd.read_json(json_data) grouped_data = data.groupby('location').apply(lambda x: x.to_dict('records')).to_json(orient="index", indent=True) print(grouped_data)
Output:
{ "CityA":[ { "user_id":1, "data_used":5.2, "location":"CityA" }, { "user_id":3, "data_used":3.5, "location":"CityA" } ], "CityB":[ { "user_id":2, "data_used":7.8, "location":"CityB" } ], "CityC":[ { "user_id":4, "data_used":2.4, "location":"CityC" } ] }
In this snippet, the JSON data is first converted into a Pandas DataFrame.
Then, the groupby
method groups the data by the ‘location’ column. The lambda
function within apply
converts each group into a list of dictionaries.
Finally, the to_json()
method converts the data back to JSON.
Using Custom Functions
Writing a custom function is useful when you have unique criteria for grouping or need to perform additional processing on the data.
Let’s create a custom function to group our dataset:
import json json_data = ''' [ {"user_id": "001", "data_used": 5.2, "location": "CityA"}, {"user_id": "002", "data_used": 7.8, "location": "CityB"}, {"user_id": "003", "data_used": 3.5, "location": "CityA"}, {"user_id": "004", "data_used": 2.4, "location": "CityC"} ] ''' data = json.loads(json_data) def group_data(data, key): grouped_data = {} for item in data: group_key = item[key] if group_key not in grouped_data: grouped_data[group_key] = [] grouped_data[group_key].append(item) return grouped_data grouped_data = group_data(data, 'location') print(grouped_data)
Output:
{ "CityA": [ {"user_id": "001", "data_used": 5.2, "location": "CityA"}, {"user_id": "003", "data_used": 3.5, "location": "CityA"} ], "CityB": [{"user_id": "002", "data_used": 7.8, "location": "CityB"}], "CityC": [{"user_id": "004", "data_used": 2.4, "location": "CityC"}] }
This function, group_data
, takes the dataset and a key to group by as arguments.
It iterates over the dataset, grouping items into a dictionary based on the specified key.
Using Collections Module (defaultdict)
The collections
module in Python offers defaultdict
that provides a default value for a key that does not exist.
This feature is useful for grouping data, as it simplifies the code by eliminating the need to check if a key already exists in the dictionary.
Here’s how you can use defaultdict
to group our data:
import json from collections import defaultdict json_data = ''' [ {"user_id": "001", "data_used": 5.2, "location": "CityA"}, {"user_id": "002", "data_used": 7.8, "location": "CityB"}, {"user_id": "003", "data_used": 3.5, "location": "CityA"}, {"user_id": "004", "data_used": 2.4, "location": "CityC"} ] ''' data = json.loads(json_data) grouped_data = defaultdict(list) for item in data: grouped_data[item['location']].append(item) print(dict(grouped_data))
Output:
{ "CityA": [ {"user_id": "001", "data_used": 5.2, "location": "CityA"}, {"user_id": "003", "data_used": 3.5, "location": "CityA"} ], "CityB": [{"user_id": "002", "data_used": 7.8, "location": "CityB"}], "CityC": [{"user_id": "004", "data_used": 2.4, "location": "CityC"}] }
In this example, defaultdict
is used to automatically create a new list for each new key.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.