Convert Pandas DataFrame to Nested (Hierarchical) JSON

In this tutorial, you’ll learn how to convert Pandas DataFrame to a nested JSON format.

The examples in this tutorial demonstrate various techniques to convert Pandas DataFrames into different nested JSON structures.

Table of Contents hide

1 Simple Nesting with to_json
2 Grouping and Nesting
3 Complex Nesting with Multi-Level Index
4 Nested JSON with Custom Keys
5 Multi-Level Grouping
6 Complex Nesting with Custom Aggregation
7 Hierarchical Nesting with Multiple Levels
8 Conditional Nesting Based on a Column
9 Nested JSON with Combined Fields

Simple Nesting with to_json

Suppose we have a DataFrame like this:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3],
    'Plan': ['Basic', 'Premium', 'Standard'],
    'DataUsage': [2.5, 5.0, 3.5],
    'MinutesUsage': [300, 500, 400]
}
df = pd.DataFrame(data)

Here, we’ll nest the usage details under a single key using to_json() function.

nested_json = df.to_json(orient='records')
print(nested_json)

Output:

[
    {"CustomerID": 1, "Plan": "Basic", "DataUsage": 2.5, "MinutesUsage": 300},
    {"CustomerID": 2, "Plan": "Premium", "DataUsage": 5.0, "MinutesUsage": 500},
    {"CustomerID": 3, "Plan": "Standard", "DataUsage": 3.5, "MinutesUsage": 400}
]

The JSON consists of a single array of objects and has 1 nesting level.

Each object represents a customer and contains keys such as “CustomerID,” “Plan,” “DataUsage,” and “MinutesUsage”.

Grouping and Nesting

Imagine a DataFrame that includes regions for each customer.

data = {
    'CustomerID': [1, 2, 3, 4],
    'Region': ['East', 'West', 'East', 'West'],
    'Plan': ['Basic', 'Premium', 'Standard', 'Basic'],
    'DataUsage': [2.5, 5.0, 3.5, 2.0]
}
df = pd.DataFrame(data)

We can group by ‘Region’ and then create a nested structure.

nested_json = df.groupby('Region').apply(lambda x: x.drop('Region', axis=1).to_dict(orient='records')).to_json()
print(nested_json)

Output:

{
    "East": [
        {"CustomerID": 1, "Plan": "Basic", "DataUsage": 2.5},
        {"CustomerID": 3, "Plan": "Standard", "DataUsage": 3.5}
    ],
    "West": [
        {"CustomerID": 2, "Plan": "Premium", "DataUsage": 5.0},
        {"CustomerID": 4, "Plan": "Basic", "DataUsage": 2.0}
    ]
}

The JSON consists of an object with two keys, “East” and “West”. Each key corresponds to an array of objects representing customers in the respective region.

Each customer object contains keys like “CustomerID,” “Plan,” and “DataUsage”.

Complex Nesting with Multi-Level Index

Suppose you have a DataFrame with multi-level indices like this:

data = {
    'CustomerID': [1, 1, 2, 2],
    'Month': ['Jan', 'Feb', 'Jan', 'Feb'],
    'DataUsage': [2.5, 3.0, 5.0, 4.5],
    'MinutesUsage': [300, 350, 500, 450]
}
df = pd.DataFrame(data).set_index(['CustomerID', 'Month'])

Creating a nested JSON with multi-level indices.

nested_json = df.to_json(orient='index')
print(nested_json)

Output:

{
    "[1, "Jan"]": {"DataUsage": 2.5, "MinutesUsage": 300},
    "[1, "Feb"]": {"DataUsage": 3.0, "MinutesUsage": 350},
    "[2, "Jan"]": {"DataUsage": 5.0, "MinutesUsage": 500},
    "[2, "Feb"]": {"DataUsage": 4.5, "MinutesUsage": 450}
}

The JSON consists of an object with keys that are string representations of arrays. The keys represent a combination of customer IDs and months.

The corresponding values are objects containing keys such as “DataUsage” and “MinutesUsage”.

Nested JSON with Custom Keys

This DataFrame includes customer information with multiple attributes.

data = {
    'CustomerID': [1, 2],
    'Name': ['John Doe', 'Jane Smith'],
    'Plan': ['Basic', 'Premium'],
    'DataUsage': [2.5, 5.0],
    'MinutesUsage': [300, 500]
}
df = pd.DataFrame(data)

We create a nested structure with ‘Usage’ as a sub-dictionary.

df['Usage'] = df[['DataUsage', 'MinutesUsage']].to_dict(orient='records')
nested_json = df.drop(['DataUsage', 'MinutesUsage'], axis=1).to_json(orient='records')
print(nested_json)

Output:

[
    {"CustomerID": 1, "Name": "John Doe", "Plan": "Basic", "Usage": {"DataUsage": 2.5, "MinutesUsage": 300}},
    {"CustomerID": 2, "Name": "Jane Smith", "Plan": "Premium", "Usage": {"DataUsage": 5.0, "MinutesUsage": 500}}
]

The JSON consists of an array of objects. Each object represents a customer and contains keys like “CustomerID,” “Name,” “Plan,” and “Usage”.

The “Usage” key’s value is an object with keys such as “DataUsage” and “MinutesUsage”.

Multi-Level Grouping

Given the following DataFrame:

data = {
    'CustomerID': [1, 1, 2, 2],
    'Month': ['Jan', 'Feb', 'Jan', 'Feb'],
    'DataUsage': [2.5, 3.0, 5.0, 4.5],
    'MinutesUsage': [300, 350, 500, 450]
}
df = pd.DataFrame(data)

Grouping by ‘CustomerID’ and then by ‘Month’ to create a nested JSON.

nested_json = df.groupby('CustomerID').apply(lambda x: x.groupby('Month').apply(lambda y: y.drop(['CustomerID', 'Month'], axis=1).to_dict(orient='records'))).to_json()
print(nested_json)

Output:

{
  "Feb": {
    "1": [
      {
        "DataUsage": 3,
        "MinutesUsage": 350
      }
    ],
    "2": [
      {
        "DataUsage": 4.5,
        "MinutesUsage": 450
      }
    ]
  },
  "Jan": {
    "1": [
      {
        "DataUsage": 2.5,
        "MinutesUsage": 300
      }
    ],
    "2": [
      {
        "DataUsage": 5,
        "MinutesUsage": 500
      }
    ]
  }
}

The JSON consists of an object with keys representing months (“Jan” and “Feb”).

Each month key contains another object with keys representing days (“1” and “2”).

Each day key contains an array of objects representing customer usage for that specific month and day.

The usage objects have keys like “DataUsage” and “MinutesUsage”.

Complex Nesting with Custom Aggregation

Suppose you have the following DataFrame:

data = {
    'CustomerID': [1, 2, 1, 2],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'DataUsage': [2.5, 5.0, 3.0, 4.0],
    'MinutesUsage': [300, 500, 350, 450]
}
df = pd.DataFrame(data)

Custom aggregation to nest data under each plan.

nested_json = df.groupby(['CustomerID', 'Plan']).agg(list).reset_index().groupby('CustomerID').apply(lambda x: x[['Plan', 'DataUsage', 'MinutesUsage']].to_dict(orient='records')).to_json()
print(nested_json)

Output:

{
    "1": [
        {"Plan": "Basic", "DataUsage": [2.5, 3.0], "MinutesUsage": [300, 350]}
    ],
    "2": [
        {"Plan": "Premium", "DataUsage": [5.0, 4.0], "MinutesUsage": [500, 450]}
    ]
}

The JSON consists of an object with keys representing customer IDs (“1” and “2”).

Each customer ID key contains an array of objects representing different plans (“Basic” and “Premium”).

The plan objects have keys like “DataUsage” and “MinutesUsage”.

Hierarchical Nesting with Multiple Levels

Check the following DataFrame:

data = {
    'CustomerID': [1, 1, 2, 2],
    'Plan': ['Basic', 'Basic', 'Premium', 'Premium'],
    'Month': ['Jan', 'Feb', 'Jan', 'Feb'],
    'DataUsage': [2.5, 3.0, 5.0, 4.5],
    'MinutesUsage': [300, 350, 500, 450]
}
df = pd.DataFrame(data)

We’ll create a multi-level nested structure where each customer has plans and each plan has monthly records.

nested_json = df.groupby(['CustomerID', 'Plan']).apply(lambda x: x.groupby('Month').apply(lambda y: y.drop(['CustomerID', 'Plan', 'Month'], axis=1).to_dict(orient='records'))).to_json()
print(nested_json)

Output:

{
  "Feb": {
    "(1, 'Basic')": [
      {
        "DataUsage": 3,
        "MinutesUsage": 350
      }
    ],
    "(2, 'Premium')": [
      {
        "DataUsage": 4.5,
        "MinutesUsage": 450
      }
    ]
  },
  "Jan": {
    "(1, 'Basic')": [
      {
        "DataUsage": 2.5,
        "MinutesUsage": 300
      }
    ],
    "(2, 'Premium')": [
      {
        "DataUsage": 5,
        "MinutesUsage": 500
      }
    ]
  }
}

The JSON consists of an object with keys representing months (“Jan” and “Feb”).

Each month key contains another object with keys representing a combination of customer IDs and plan types. The customer ID and plan type keys are strings.

Each customer ID and plan type key contains an array of objects representing customer usage. The usage objects have keys like “DataUsage” and “MinutesUsage”.

Conditional Nesting Based on a Column

Given the following DataFrame:

data = {
    'CustomerID': [1, 2, 1, 2],
    'ServiceType': ['Data', 'Voice', 'Voice', 'Data'],
    'Usage': [100, 200, 150, 250],
    'Month': ['Jan', 'Jan', 'Feb', 'Feb']
}
df = pd.DataFrame(data)

Nesting data differently for ‘Data’ and ‘Voice’ services.

def custom_nesting(group):
    if group.name == 'Data':
        return group.drop('ServiceType', axis=1).to_dict(orient='records')
    else:
        return group.groupby('Month').apply(lambda x: x.drop(['ServiceType', 'Month'], axis=1).to_dict(orient='records')).to_dict()
nested_json = df.groupby('ServiceType').apply(custom_nesting).to_json()
print(nested_json)

Output:

{
  "Data": [
    {
      "CustomerID": 1,
      "Usage": 100,
      "Month": "Jan"
    },
    {
      "CustomerID": 2,
      "Usage": 250,
      "Month": "Feb"
    }
  ],
  "Voice": {
    "Feb": [
      {
        "CustomerID": 1,
        "Usage": 150
      }
    ],
    "Jan": [
      {
        "CustomerID": 2,
        "Usage": 200
      }
    ]
  }
}

The JSON consists of an object with keys “Data” and “Voice”. The “Data” key contains an array of objects representing customer data usage.

Each data usage object has keys like “CustomerID,” “Usage,” and “Month”.

The “Voice” key contains an object with keys representing months (“Jan” and “Feb”). Each month key contains an array of objects representing customer voice usage.

The voice usage objects have keys like “CustomerID” and “Usage”.

Nested JSON with Combined Fields

Check the following DataFrame:

data = {
    'CustomerID': [1, 2],
    'PlanType': ['Basic', 'Premium'],
    'DataUsage': [2.5, 5.0],
    'MinutesUsage': [300, 500],
    'Region': ['East', 'West']
}
df = pd.DataFrame(data)

Combining ‘DataUsage’ and ‘MinutesUsage’ into a single ‘Usage’ field.

df['Usage'] = df.apply(lambda row: {'Data': row['DataUsage'], 'Minutes': row['MinutesUsage']}, axis=1)
nested_json = df.drop(['DataUsage', 'MinutesUsage'], axis=1).to_json(orient='records')
print(nested_json)

Output:

[
  {
    "CustomerID": 1,
    "PlanType": "Basic",
    "Region": "East",
    "Usage": {
      "Data": 2.5,
      "Minutes": 300
    }
  },
  {
    "CustomerID": 2,
    "PlanType": "Premium",
    "Region": "West",
    "Usage": {
      "Data": 5,
      "Minutes": 500
    }
  }
]

The JSON consists of an array of objects. Each object represents a customer and contains keys like “CustomerID,” “PlanType,” “Region,” and “Usage”.

The “Usage” key’s value is an object with keys like “Data” and “Minutes”.

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Simple Nesting with to_json

Grouping and Nesting

Complex Nesting with Multi-Level Index

Nested JSON with Custom Keys

Multi-Level Grouping

Complex Nesting with Custom Aggregation

Hierarchical Nesting with Multiple Levels

Conditional Nesting Based on a Column

Nested JSON with Combined Fields

Related posts

Leave a Reply Cancel reply