Convert Pandas DataFrame to Nested (Hierarchical) JSON

In this tutorial, you’ll learn how to convert Pandas DataFrame to a nested JSON format.

The examples in this tutorial demonstrate various techniques to convert Pandas DataFrames into different nested JSON structures.

 

 

Simple Nesting with to_json

Suppose we have a DataFrame like this:

import pandas as pd
data = {
    'CustomerID': [1, 2, 3],
    'Plan': ['Basic', 'Premium', 'Standard'],
    'DataUsage': [2.5, 5.0, 3.5],
    'MinutesUsage': [300, 500, 400]
}
df = pd.DataFrame(data)

Here, we’ll nest the usage details under a single key using to_json() function.

nested_json = df.to_json(orient='records')
print(nested_json)

Output:

[
    {"CustomerID": 1, "Plan": "Basic", "DataUsage": 2.5, "MinutesUsage": 300},
    {"CustomerID": 2, "Plan": "Premium", "DataUsage": 5.0, "MinutesUsage": 500},
    {"CustomerID": 3, "Plan": "Standard", "DataUsage": 3.5, "MinutesUsage": 400}
]

The JSON consists of a single array of objects and has 1 nesting level.

Each object represents a customer and contains keys such as “CustomerID,” “Plan,” “DataUsage,” and “MinutesUsage”.

 

Grouping and Nesting

Imagine a DataFrame that includes regions for each customer.

data = {
    'CustomerID': [1, 2, 3, 4],
    'Region': ['East', 'West', 'East', 'West'],
    'Plan': ['Basic', 'Premium', 'Standard', 'Basic'],
    'DataUsage': [2.5, 5.0, 3.5, 2.0]
}
df = pd.DataFrame(data)

We can group by ‘Region’ and then create a nested structure.

nested_json = df.groupby('Region').apply(lambda x: x.drop('Region', axis=1).to_dict(orient='records')).to_json()
print(nested_json)

Output:

{
    "East": [
        {"CustomerID": 1, "Plan": "Basic", "DataUsage": 2.5},
        {"CustomerID": 3, "Plan": "Standard", "DataUsage": 3.5}
    ],
    "West": [
        {"CustomerID": 2, "Plan": "Premium", "DataUsage": 5.0},
        {"CustomerID": 4, "Plan": "Basic", "DataUsage": 2.0}
    ]
}

The JSON consists of an object with two keys, “East” and “West”. Each key corresponds to an array of objects representing customers in the respective region.

Each customer object contains keys like “CustomerID,” “Plan,” and “DataUsage”.

 

Complex Nesting with Multi-Level Index

Suppose you have a DataFrame with multi-level indices like this:

data = {
    'CustomerID': [1, 1, 2, 2],
    'Month': ['Jan', 'Feb', 'Jan', 'Feb'],
    'DataUsage': [2.5, 3.0, 5.0, 4.5],
    'MinutesUsage': [300, 350, 500, 450]
}
df = pd.DataFrame(data).set_index(['CustomerID', 'Month'])

Creating a nested JSON with multi-level indices.

nested_json = df.to_json(orient='index')
print(nested_json)

Output:

{
    "[1, "Jan"]": {"DataUsage": 2.5, "MinutesUsage": 300},
    "[1, "Feb"]": {"DataUsage": 3.0, "MinutesUsage": 350},
    "[2, "Jan"]": {"DataUsage": 5.0, "MinutesUsage": 500},
    "[2, "Feb"]": {"DataUsage": 4.5, "MinutesUsage": 450}
}

The JSON consists of an object with keys that are string representations of arrays. The keys represent a combination of customer IDs and months.

The corresponding values are objects containing keys such as “DataUsage” and “MinutesUsage”.

 

Nested JSON with Custom Keys

This DataFrame includes customer information with multiple attributes.

data = {
    'CustomerID': [1, 2],
    'Name': ['John Doe', 'Jane Smith'],
    'Plan': ['Basic', 'Premium'],
    'DataUsage': [2.5, 5.0],
    'MinutesUsage': [300, 500]
}
df = pd.DataFrame(data)

We create a nested structure with ‘Usage’ as a sub-dictionary.

df['Usage'] = df[['DataUsage', 'MinutesUsage']].to_dict(orient='records')
nested_json = df.drop(['DataUsage', 'MinutesUsage'], axis=1).to_json(orient='records')
print(nested_json)

Output:

[
    {"CustomerID": 1, "Name": "John Doe", "Plan": "Basic", "Usage": {"DataUsage": 2.5, "MinutesUsage": 300}},
    {"CustomerID": 2, "Name": "Jane Smith", "Plan": "Premium", "Usage": {"DataUsage": 5.0, "MinutesUsage": 500}}
]

The JSON consists of an array of objects. Each object represents a customer and contains keys like “CustomerID,” “Name,” “Plan,” and “Usage”.

The “Usage” key’s value is an object with keys such as “DataUsage” and “MinutesUsage”.

 

Multi-Level Grouping

Given the following DataFrame:

data = {
    'CustomerID': [1, 1, 2, 2],
    'Month': ['Jan', 'Feb', 'Jan', 'Feb'],
    'DataUsage': [2.5, 3.0, 5.0, 4.5],
    'MinutesUsage': [300, 350, 500, 450]
}
df = pd.DataFrame(data)

Grouping by ‘CustomerID’ and then by ‘Month’ to create a nested JSON.

nested_json = df.groupby('CustomerID').apply(lambda x: x.groupby('Month').apply(lambda y: y.drop(['CustomerID', 'Month'], axis=1).to_dict(orient='records'))).to_json()
print(nested_json)

Output:

{
  "Feb": {
    "1": [
      {
        "DataUsage": 3,
        "MinutesUsage": 350
      }
    ],
    "2": [
      {
        "DataUsage": 4.5,
        "MinutesUsage": 450
      }
    ]
  },
  "Jan": {
    "1": [
      {
        "DataUsage": 2.5,
        "MinutesUsage": 300
      }
    ],
    "2": [
      {
        "DataUsage": 5,
        "MinutesUsage": 500
      }
    ]
  }
}

The JSON consists of an object with keys representing months (“Jan” and “Feb”).

Each month key contains another object with keys representing days (“1” and “2”).

Each day key contains an array of objects representing customer usage for that specific month and day.

The usage objects have keys like “DataUsage” and “MinutesUsage”.

 

Complex Nesting with Custom Aggregation

Suppose you have the following DataFrame:

data = {
    'CustomerID': [1, 2, 1, 2],
    'Plan': ['Basic', 'Premium', 'Basic', 'Premium'],
    'DataUsage': [2.5, 5.0, 3.0, 4.0],
    'MinutesUsage': [300, 500, 350, 450]
}
df = pd.DataFrame(data)

Custom aggregation to nest data under each plan.

nested_json = df.groupby(['CustomerID', 'Plan']).agg(list).reset_index().groupby('CustomerID').apply(lambda x: x[['Plan', 'DataUsage', 'MinutesUsage']].to_dict(orient='records')).to_json()
print(nested_json)

Output:

{
    "1": [
        {"Plan": "Basic", "DataUsage": [2.5, 3.0], "MinutesUsage": [300, 350]}
    ],
    "2": [
        {"Plan": "Premium", "DataUsage": [5.0, 4.0], "MinutesUsage": [500, 450]}
    ]
}

The JSON consists of an object with keys representing customer IDs (“1” and “2”).

Each customer ID key contains an array of objects representing different plans (“Basic” and “Premium”).

The plan objects have keys like “DataUsage” and “MinutesUsage”.

 

Hierarchical Nesting with Multiple Levels

Check the following DataFrame:

data = {
    'CustomerID': [1, 1, 2, 2],
    'Plan': ['Basic', 'Basic', 'Premium', 'Premium'],
    'Month': ['Jan', 'Feb', 'Jan', 'Feb'],
    'DataUsage': [2.5, 3.0, 5.0, 4.5],
    'MinutesUsage': [300, 350, 500, 450]
}
df = pd.DataFrame(data)

We’ll create a multi-level nested structure where each customer has plans and each plan has monthly records.

nested_json = df.groupby(['CustomerID', 'Plan']).apply(lambda x: x.groupby('Month').apply(lambda y: y.drop(['CustomerID', 'Plan', 'Month'], axis=1).to_dict(orient='records'))).to_json()
print(nested_json)

Output:

{
  "Feb": {
    "(1, 'Basic')": [
      {
        "DataUsage": 3,
        "MinutesUsage": 350
      }
    ],
    "(2, 'Premium')": [
      {
        "DataUsage": 4.5,
        "MinutesUsage": 450
      }
    ]
  },
  "Jan": {
    "(1, 'Basic')": [
      {
        "DataUsage": 2.5,
        "MinutesUsage": 300
      }
    ],
    "(2, 'Premium')": [
      {
        "DataUsage": 5,
        "MinutesUsage": 500
      }
    ]
  }
}

The JSON consists of an object with keys representing months (“Jan” and “Feb”).

Each month key contains another object with keys representing a combination of customer IDs and plan types. The customer ID and plan type keys are strings.

Each customer ID and plan type key contains an array of objects representing customer usage. The usage objects have keys like “DataUsage” and “MinutesUsage”.

 

Conditional Nesting Based on a Column

Given the following DataFrame:

data = {
    'CustomerID': [1, 2, 1, 2],
    'ServiceType': ['Data', 'Voice', 'Voice', 'Data'],
    'Usage': [100, 200, 150, 250],
    'Month': ['Jan', 'Jan', 'Feb', 'Feb']
}
df = pd.DataFrame(data)

Nesting data differently for ‘Data’ and ‘Voice’ services.

def custom_nesting(group):
    if group.name == 'Data':
        return group.drop('ServiceType', axis=1).to_dict(orient='records')
    else:
        return group.groupby('Month').apply(lambda x: x.drop(['ServiceType', 'Month'], axis=1).to_dict(orient='records')).to_dict()
nested_json = df.groupby('ServiceType').apply(custom_nesting).to_json()
print(nested_json)

Output:

{
  "Data": [
    {
      "CustomerID": 1,
      "Usage": 100,
      "Month": "Jan"
    },
    {
      "CustomerID": 2,
      "Usage": 250,
      "Month": "Feb"
    }
  ],
  "Voice": {
    "Feb": [
      {
        "CustomerID": 1,
        "Usage": 150
      }
    ],
    "Jan": [
      {
        "CustomerID": 2,
        "Usage": 200
      }
    ]
  }
}

The JSON consists of an object with keys “Data” and “Voice”. The “Data” key contains an array of objects representing customer data usage.

Each data usage object has keys like “CustomerID,” “Usage,” and “Month”.

The “Voice” key contains an object with keys representing months (“Jan” and “Feb”). Each month key contains an array of objects representing customer voice usage.

The voice usage objects have keys like “CustomerID” and “Usage”.

 

Nested JSON with Combined Fields

Check the following DataFrame:

data = {
    'CustomerID': [1, 2],
    'PlanType': ['Basic', 'Premium'],
    'DataUsage': [2.5, 5.0],
    'MinutesUsage': [300, 500],
    'Region': ['East', 'West']
}
df = pd.DataFrame(data)

Combining ‘DataUsage’ and ‘MinutesUsage’ into a single ‘Usage’ field.

df['Usage'] = df.apply(lambda row: {'Data': row['DataUsage'], 'Minutes': row['MinutesUsage']}, axis=1)
nested_json = df.drop(['DataUsage', 'MinutesUsage'], axis=1).to_json(orient='records')
print(nested_json)

Output:

[
  {
    "CustomerID": 1,
    "PlanType": "Basic",
    "Region": "East",
    "Usage": {
      "Data": 2.5,
      "Minutes": 300
    }
  },
  {
    "CustomerID": 2,
    "PlanType": "Premium",
    "Region": "West",
    "Usage": {
      "Data": 5,
      "Minutes": 500
    }
  }
]

The JSON consists of an array of objects. Each object represents a customer and contains keys like “CustomerID,” “PlanType,” “Region,” and “Usage”.

The “Usage” key’s value is an object with keys like “Data” and “Minutes”.

Leave a Reply

Your email address will not be published. Required fields are marked *