Convert Pandas DataFrame to Nested (Hierarchical) JSON
In this tutorial, you’ll learn how to convert Pandas DataFrame to a nested JSON format.
The examples in this tutorial demonstrate various techniques to convert Pandas DataFrames into different nested JSON structures.
- 1 Simple Nesting with to_json
- 2 Grouping and Nesting
- 3 Complex Nesting with Multi-Level Index
- 4 Nested JSON with Custom Keys
- 5 Multi-Level Grouping
- 6 Complex Nesting with Custom Aggregation
- 7 Hierarchical Nesting with Multiple Levels
- 8 Conditional Nesting Based on a Column
- 9 Nested JSON with Combined Fields
Simple Nesting with to_json
Suppose we have a DataFrame like this:
import pandas as pd data = { 'CustomerID': [1, 2, 3], 'Plan': ['Basic', 'Premium', 'Standard'], 'DataUsage': [2.5, 5.0, 3.5], 'MinutesUsage': [300, 500, 400] } df = pd.DataFrame(data)
Here, we’ll nest the usage details under a single key using to_json()
function.
nested_json = df.to_json(orient='records') print(nested_json)
Output:
[ {"CustomerID": 1, "Plan": "Basic", "DataUsage": 2.5, "MinutesUsage": 300}, {"CustomerID": 2, "Plan": "Premium", "DataUsage": 5.0, "MinutesUsage": 500}, {"CustomerID": 3, "Plan": "Standard", "DataUsage": 3.5, "MinutesUsage": 400} ]
The JSON consists of a single array of objects and has 1 nesting level.
Each object represents a customer and contains keys such as “CustomerID,” “Plan,” “DataUsage,” and “MinutesUsage”.
Grouping and Nesting
Imagine a DataFrame that includes regions for each customer.
data = { 'CustomerID': [1, 2, 3, 4], 'Region': ['East', 'West', 'East', 'West'], 'Plan': ['Basic', 'Premium', 'Standard', 'Basic'], 'DataUsage': [2.5, 5.0, 3.5, 2.0] } df = pd.DataFrame(data)
We can group by ‘Region’ and then create a nested structure.
nested_json = df.groupby('Region').apply(lambda x: x.drop('Region', axis=1).to_dict(orient='records')).to_json() print(nested_json)
Output:
{ "East": [ {"CustomerID": 1, "Plan": "Basic", "DataUsage": 2.5}, {"CustomerID": 3, "Plan": "Standard", "DataUsage": 3.5} ], "West": [ {"CustomerID": 2, "Plan": "Premium", "DataUsage": 5.0}, {"CustomerID": 4, "Plan": "Basic", "DataUsage": 2.0} ] }
The JSON consists of an object with two keys, “East” and “West”. Each key corresponds to an array of objects representing customers in the respective region.
Each customer object contains keys like “CustomerID,” “Plan,” and “DataUsage”.
Complex Nesting with Multi-Level Index
Suppose you have a DataFrame with multi-level indices like this:
data = { 'CustomerID': [1, 1, 2, 2], 'Month': ['Jan', 'Feb', 'Jan', 'Feb'], 'DataUsage': [2.5, 3.0, 5.0, 4.5], 'MinutesUsage': [300, 350, 500, 450] } df = pd.DataFrame(data).set_index(['CustomerID', 'Month'])
Creating a nested JSON with multi-level indices.
nested_json = df.to_json(orient='index') print(nested_json)
Output:
{ "[1, "Jan"]": {"DataUsage": 2.5, "MinutesUsage": 300}, "[1, "Feb"]": {"DataUsage": 3.0, "MinutesUsage": 350}, "[2, "Jan"]": {"DataUsage": 5.0, "MinutesUsage": 500}, "[2, "Feb"]": {"DataUsage": 4.5, "MinutesUsage": 450} }
The JSON consists of an object with keys that are string representations of arrays. The keys represent a combination of customer IDs and months.
The corresponding values are objects containing keys such as “DataUsage” and “MinutesUsage”.
Nested JSON with Custom Keys
This DataFrame includes customer information with multiple attributes.
data = { 'CustomerID': [1, 2], 'Name': ['John Doe', 'Jane Smith'], 'Plan': ['Basic', 'Premium'], 'DataUsage': [2.5, 5.0], 'MinutesUsage': [300, 500] } df = pd.DataFrame(data)
We create a nested structure with ‘Usage’ as a sub-dictionary.
df['Usage'] = df[['DataUsage', 'MinutesUsage']].to_dict(orient='records') nested_json = df.drop(['DataUsage', 'MinutesUsage'], axis=1).to_json(orient='records') print(nested_json)
Output:
[ {"CustomerID": 1, "Name": "John Doe", "Plan": "Basic", "Usage": {"DataUsage": 2.5, "MinutesUsage": 300}}, {"CustomerID": 2, "Name": "Jane Smith", "Plan": "Premium", "Usage": {"DataUsage": 5.0, "MinutesUsage": 500}} ]
The JSON consists of an array of objects. Each object represents a customer and contains keys like “CustomerID,” “Name,” “Plan,” and “Usage”.
The “Usage” key’s value is an object with keys such as “DataUsage” and “MinutesUsage”.
Multi-Level Grouping
Given the following DataFrame:
data = { 'CustomerID': [1, 1, 2, 2], 'Month': ['Jan', 'Feb', 'Jan', 'Feb'], 'DataUsage': [2.5, 3.0, 5.0, 4.5], 'MinutesUsage': [300, 350, 500, 450] } df = pd.DataFrame(data)
Grouping by ‘CustomerID’ and then by ‘Month’ to create a nested JSON.
nested_json = df.groupby('CustomerID').apply(lambda x: x.groupby('Month').apply(lambda y: y.drop(['CustomerID', 'Month'], axis=1).to_dict(orient='records'))).to_json() print(nested_json)
Output:
{ "Feb": { "1": [ { "DataUsage": 3, "MinutesUsage": 350 } ], "2": [ { "DataUsage": 4.5, "MinutesUsage": 450 } ] }, "Jan": { "1": [ { "DataUsage": 2.5, "MinutesUsage": 300 } ], "2": [ { "DataUsage": 5, "MinutesUsage": 500 } ] } }
The JSON consists of an object with keys representing months (“Jan” and “Feb”).
Each month key contains another object with keys representing days (“1” and “2”).
Each day key contains an array of objects representing customer usage for that specific month and day.
The usage objects have keys like “DataUsage” and “MinutesUsage”.
Complex Nesting with Custom Aggregation
Suppose you have the following DataFrame:
data = { 'CustomerID': [1, 2, 1, 2], 'Plan': ['Basic', 'Premium', 'Basic', 'Premium'], 'DataUsage': [2.5, 5.0, 3.0, 4.0], 'MinutesUsage': [300, 500, 350, 450] } df = pd.DataFrame(data)
Custom aggregation to nest data under each plan.
nested_json = df.groupby(['CustomerID', 'Plan']).agg(list).reset_index().groupby('CustomerID').apply(lambda x: x[['Plan', 'DataUsage', 'MinutesUsage']].to_dict(orient='records')).to_json() print(nested_json)
Output:
{ "1": [ {"Plan": "Basic", "DataUsage": [2.5, 3.0], "MinutesUsage": [300, 350]} ], "2": [ {"Plan": "Premium", "DataUsage": [5.0, 4.0], "MinutesUsage": [500, 450]} ] }
The JSON consists of an object with keys representing customer IDs (“1” and “2”).
Each customer ID key contains an array of objects representing different plans (“Basic” and “Premium”).
The plan objects have keys like “DataUsage” and “MinutesUsage”.
Hierarchical Nesting with Multiple Levels
Check the following DataFrame:
data = { 'CustomerID': [1, 1, 2, 2], 'Plan': ['Basic', 'Basic', 'Premium', 'Premium'], 'Month': ['Jan', 'Feb', 'Jan', 'Feb'], 'DataUsage': [2.5, 3.0, 5.0, 4.5], 'MinutesUsage': [300, 350, 500, 450] } df = pd.DataFrame(data)
We’ll create a multi-level nested structure where each customer has plans and each plan has monthly records.
nested_json = df.groupby(['CustomerID', 'Plan']).apply(lambda x: x.groupby('Month').apply(lambda y: y.drop(['CustomerID', 'Plan', 'Month'], axis=1).to_dict(orient='records'))).to_json() print(nested_json)
Output:
{ "Feb": { "(1, 'Basic')": [ { "DataUsage": 3, "MinutesUsage": 350 } ], "(2, 'Premium')": [ { "DataUsage": 4.5, "MinutesUsage": 450 } ] }, "Jan": { "(1, 'Basic')": [ { "DataUsage": 2.5, "MinutesUsage": 300 } ], "(2, 'Premium')": [ { "DataUsage": 5, "MinutesUsage": 500 } ] } }
The JSON consists of an object with keys representing months (“Jan” and “Feb”).
Each month key contains another object with keys representing a combination of customer IDs and plan types. The customer ID and plan type keys are strings.
Each customer ID and plan type key contains an array of objects representing customer usage. The usage objects have keys like “DataUsage” and “MinutesUsage”.
Conditional Nesting Based on a Column
Given the following DataFrame:
data = { 'CustomerID': [1, 2, 1, 2], 'ServiceType': ['Data', 'Voice', 'Voice', 'Data'], 'Usage': [100, 200, 150, 250], 'Month': ['Jan', 'Jan', 'Feb', 'Feb'] } df = pd.DataFrame(data)
Nesting data differently for ‘Data’ and ‘Voice’ services.
def custom_nesting(group): if group.name == 'Data': return group.drop('ServiceType', axis=1).to_dict(orient='records') else: return group.groupby('Month').apply(lambda x: x.drop(['ServiceType', 'Month'], axis=1).to_dict(orient='records')).to_dict() nested_json = df.groupby('ServiceType').apply(custom_nesting).to_json() print(nested_json)
Output:
{ "Data": [ { "CustomerID": 1, "Usage": 100, "Month": "Jan" }, { "CustomerID": 2, "Usage": 250, "Month": "Feb" } ], "Voice": { "Feb": [ { "CustomerID": 1, "Usage": 150 } ], "Jan": [ { "CustomerID": 2, "Usage": 200 } ] } }
The JSON consists of an object with keys “Data” and “Voice”. The “Data” key contains an array of objects representing customer data usage.
Each data usage object has keys like “CustomerID,” “Usage,” and “Month”.
The “Voice” key contains an object with keys representing months (“Jan” and “Feb”). Each month key contains an array of objects representing customer voice usage.
The voice usage objects have keys like “CustomerID” and “Usage”.
Nested JSON with Combined Fields
Check the following DataFrame:
data = { 'CustomerID': [1, 2], 'PlanType': ['Basic', 'Premium'], 'DataUsage': [2.5, 5.0], 'MinutesUsage': [300, 500], 'Region': ['East', 'West'] } df = pd.DataFrame(data)
Combining ‘DataUsage’ and ‘MinutesUsage’ into a single ‘Usage’ field.
df['Usage'] = df.apply(lambda row: {'Data': row['DataUsage'], 'Minutes': row['MinutesUsage']}, axis=1) nested_json = df.drop(['DataUsage', 'MinutesUsage'], axis=1).to_json(orient='records') print(nested_json)
Output:
[ { "CustomerID": 1, "PlanType": "Basic", "Region": "East", "Usage": { "Data": 2.5, "Minutes": 300 } }, { "CustomerID": 2, "PlanType": "Premium", "Region": "West", "Usage": { "Data": 5, "Minutes": 500 } } ]
The JSON consists of an array of objects. Each object represents a customer and contains keys like “CustomerID,” “PlanType,” “Region,” and “Usage”.
The “Usage” key’s value is an object with keys like “Data” and “Minutes”.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.