Convert CSV to Nested JSON using Python Pandas

In this tutorial, we’ll learn how to convert a CSV file to nested JSON format using Pandas in Python.

We’ll cover different cases, from basic flat structure conversion to more advanced techniques including multi-level nesting, conditional nesting, and creating nested JSON with aggregated data.

 

 

Basic Conversion

You can convert a flat CSV structure into a basic JSON format using Pandas to_json:

Code:

import pandas as pd
data = {'ID': [1, 2], 'Name': ['John', 'Jane'], 'Age': [28, 34]}
df = pd.DataFrame(data)
nested_json = df.to_json(orient='records')
print(nested_json)

Output:

[
    {"ID": 1, "Name": "John", "Age": 28},
    {"ID": 2, "Name": "Jane", "Age": 34}
]

 

Grouping by a Column

You can group data by a specific column and convert it into a nested JSON structure:

import pandas as pd
data = {'ID': [1, 1, 2, 2], 'Product': ['Phone', 'Laptop', 'Phone', 'Tablet']}
df = pd.DataFrame(data)

# Group by 'ID' and form nested JSON
grouped_json = df.groupby('ID').apply(lambda x: x.to_dict(orient='records')).to_json()
print(grouped_json)

Output:

{
    "1": [{"ID": 1, "Product": "Phone"}, {"ID": 1, "Product": "Laptop"}],
    "2": [{"ID": 2, "Product": "Phone"}, {"ID": 2, "Product": "Tablet"}]
}

 

Custom Nested Structure

You can create a more complex nested JSON structure with custom formatting:

import pandas as pd
data = {'ID': [1, 2], 'Name': ['John', 'Jane'], 'Age': [28, 34], 'Country': ['USA', 'Canada']}
df = pd.DataFrame(data)

# Define a custom function for nesting
def custom_nesting(row):
    return {'PersonalInfo': {'Name': row['Name'], 'Age': row['Age']},
            'Location': {'Country': row['Country']}}
nested_json = df.apply(custom_nesting, axis=1).to_json(orient='records')
print(nested_json)

Output:

[
    {"PersonalInfo": {"Name": "John", "Age": 28}, "Location": {"Country": "USA"}},
    {"PersonalInfo": {"Name": "Jane", "Age": 34}, "Location": {"Country": "Canada"}}
]

 

Nested Lists within JSON

You can create a JSON structure where certain columns in CSV are converted into nested lists within each JSON object.

Code:

import pandas as pd
data = {'ID': [1, 1, 2, 2], 'Feature': ['Color', 'Size', 'Color', 'Size'], 'Value': ['Red', 'Large', 'Blue', 'Medium']}
df = pd.DataFrame(data)

# Reshape data and create nested lists
df_grouped = df.groupby('ID').apply(lambda x: x[['Feature', 'Value']].to_dict(orient='records')).reset_index(name='Attributes')
nested_json = df_grouped.to_json(orient='records')
print(nested_json)

Output:

[
    {"ID": 1, "Attributes": [{"Feature": "Color", "Value": "Red"}, {"Feature": "Size", "Value": "Large"}]},
    {"ID": 2, "Attributes": [{"Feature": "Color", "Value": "Blue"}, {"Feature": "Size", "Value": "Medium"}]}
]

 

Hierarchical Nesting

You can convert CSV data into a hierarchical nested JSON, useful for representing tree-like structures:

import pandas as pd
data = {'Category': ['Electronics', 'Electronics', 'Clothing', 'Clothing'],
        'Subcategory': ['Phone', 'Laptop', 'Shirt', 'Pants'],
        'Product': ['iPhone', 'Dell Laptop', 'Cotton Shirt', 'Jeans']}
df = pd.DataFrame(data)

# Create hierarchical nested JSON
df_grouped = df.groupby(['Category', 'Subcategory']).apply(lambda x: x['Product'].tolist()).reset_index(name='Products')
nested_json = df_grouped.to_json(orient='records')
print(nested_json)

Output:

[
    {"Category": "Electronics", "Subcategory": "Laptop", "Products": ["Dell Laptop"]},
    {"Category": "Electronics", "Subcategory": "Phone", "Products": ["iPhone"]},
    {"Category": "Clothing", "Subcategory": "Pants", "Products": ["Jeans"]},
    {"Category": "Clothing", "Subcategory": "Shirt", "Products": ["Cotton Shirt"]}
]

 

Merging Columns into a Nested JSON Object

You can merge multiple columns into a single nested JSON object within each record:

import pandas as pd
data = {'ID': [1, 2], 'FirstName': ['John', 'Jane'], 'LastName': ['Doe', 'Smith'], 'Age': [28, 34]}
df = pd.DataFrame(data)

# Merge columns into a nested JSON object
df['Name'] = df.apply(lambda x: {'First': x['FirstName'], 'Last': x['LastName']}, axis=1)
nested_json = df[['ID', 'Name', 'Age']].to_json(orient='records')
print(nested_json)

Output:

[
    {"ID": 1, "Name": {"First": "John", "Last": "Doe"}, "Age": 28},
    {"ID": 2, "Name": {"First": "Jane", "Last": "Smith"}, "Age": 34}
]

 

Adding a Nested Key-Value Pair Based on Condition

You can add a nested key-value pair in the JSON output based on a condition applied to one of the columns:

import pandas as pd
data = {'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie'], 'Score': [85, 92, 78]}
df = pd.DataFrame(data)

# Add a nested 'Status' field based on 'Score'
df['Status'] = df['Score'].apply(lambda x: {'Pass': True, 'Grade': 'A' if x >= 90 else 'B'} if x >= 80 else {'Pass': False})
nested_json = df.to_json(orient='records')
print(nested_json)

Output:

[
    {"ID": 1, "Name": "Alice", "Score": 85, "Status": {"Pass": true, "Grade": "B"}},
    {"ID": 2, "Name": "Bob", "Score": 92, "Status": {"Pass": true, "Grade": "A"}},
    {"ID": 3, "Name": "Charlie", "Score": 78, "Status": {"Pass": false}}
]

 

Nested JSON with Aggregated Data

You can create a nested JSON structure that includes aggregated data from the CSV:

import pandas as pd
data = {'Department': ['HR', 'HR', 'IT', 'IT'], 'Employee': ['Alice', 'Bob', 'Charlie', 'Dave'], 'Salary': [70000, 80000, 90000, 85000]}
df = pd.DataFrame(data)
agg_data = df.groupby('Department').agg({'Employee': lambda x: list(x), 'Salary': 'mean'}).reset_index()
nested_json = agg_data.to_json(orient='records')
print(nested_json)

Output:

[
    {"Department": "HR", "Employee": ["Alice", "Bob"], "Salary": 75000.0},
    {"Department": "IT", "Employee": ["Charlie", "Dave"], "Salary": 87500.0}
]

 

Conditional Nesting Based on Multiple Columns

You can apply conditional logic to nest data differently based on multiple column values:

import pandas as pd
data = {'ID': [1, 2, 3, 4], 'Type': ['A', 'B', 'A', 'B'], 'Value1': [100, 200, 150, 250], 'Value2': [10, 20, 15, 25]}
df = pd.DataFrame(data)
def conditional_nesting(row):
    if row['Type'] == 'A':
        return {'TypeA': {'Value1': row['Value1'], 'Value2': row['Value2']}}
    else:
        return {'TypeB': {'Total': row['Value1'] + row['Value2']}}
nested_json = df.apply(conditional_nesting, axis=1).to_json(orient='records')
print(nested_json)

Output:

[
    {"TypeA": {"Value1": 100, "Value2": 10}},
    {"TypeB": {"Total": 220}},
    {"TypeA": {"Value1": 150, "Value2": 15}},
    {"TypeB": {"Total": 275}}
]
Leave a Reply

Your email address will not be published. Required fields are marked *