Convert Nested JSON to CSV using Python Pandas

In this tutorial, I’ll cover several examples that illustrate how to convert nested JSON to CSV using Pandas in Python.

These examples will cover a range of scenarios from basic to complex nested structures, dealing with arrays, handling missing data, merging data, and more.

 

 

Basic Nested JSON

In this example, a simple nested JSON structure is converted into a CSV format.

Pandas json_normalize function is used to flatten the nested JSON and then it’s converted to CSV.

Code:

import pandas as pd
from io import StringIO
data = [
    {"id": 1, "name": "Customer A", "contact": {"email": "a@example.com", "phone": "12345"}},
    {"id": 2, "name": "Customer B", "contact": {"email": "b@example.com", "phone": "67890"}}
]
df = pd.json_normalize(data)
csv_data = df.to_csv(index=False)
print(csv_data)

Output:

id,name,contact.email,contact.phone
1,Customer A,a@example.com,12345
2,Customer B,b@example.com,67890

 

Reading Nested JSON from a File

In this example, we’ll read nested JSON data from a file using Pandas read_json method and then convert it to CSV format.

Sample JSON File Content (data.json):

[
    {
        "id": 1,
        "name": "Customer A",
        "orders": [
            {"order_id": 101, "amount": 150},
            {"order_id": 102, "amount": 200}
        ]
    },
    {
        "id": 2,
        "name": "Customer B",
        "orders": [
            {"order_id": 201, "amount": 250}
        ]
    }
]

Code:

import pandas as pd
file_path = 'data.json'
df = pd.read_json(file_path)
normalized_df = pd.json_normalize(df.to_dict(orient='records'), record_path='orders', meta=['id', 'name'])
csv_data = normalized_df.to_csv(index=False)
print(csv_data)

Output:

order_id,amount,id,name
101,150,1,Customer A
102,200,1,Customer A
201,250,2,Customer B

 

Deeply Nested JSON

You can work with more deeply nested JSON data using record_path and meta parameters in json_normalize.

They are used to specify the path to the nested elements and additional metadata to include.

Code:

data = [
    {"id": 1, "name": "Customer A", "history": {"orders": [{"order_id": 101, "amount": 150}, {"order_id": 102, "amount": 200}]}},
    {"id": 2, "name": "Customer B", "history": {"orders": [{"order_id": 201, "amount": 250}]}}
]
df = pd.json_normalize(data, record_path=["history", "orders"], meta=["id", "name"])
csv_data = df.to_csv(index=False)
print(csv_data)

Output:

order_id,amount,id,name
101,150,1,Customer A
102,200,1,Customer A
201,250,2,Customer B

 

Flattening JSON with Nested Objects and Lists

This example shows how to handle a JSON structure with both nested objects and lists, where the goal is to create a CSV that includes details from both the nested objects and the list items.

Code:

import pandas as pd
data = [
    {
        "id": 1, 
        "name": "Customer A", 
        "contact": {"email": "a@example.com", "phone": "12345"},
        "products": [{"name": "Product 1", "quantity": 2}, {"name": "Product 2", "quantity": 3}]
    },
    {
        "id": 2, 
        "name": "Customer B", 
        "contact": {"email": "b@example.com", "phone": "67890"},
        "products": [{"name": "Product 3", "quantity": 1}]
    }
]
df = pd.json_normalize(
    data,
    'products',
    ['id', 'name', ['contact', 'email'], ['contact', 'phone']],
    meta_prefix='customer_'
)
csv_data = df.to_csv(index=False)
print(csv_data)

Output:

name,quantity,customer_id,customer_name,customer_contact.email,customer_contact.phone
Product 1,2,1,Customer A,a@example.com,12345
Product 2,3,1,Customer A,a@example.com,12345
Product 3,1,2,Customer B,b@example.com,67890

Here we used meta_prefix is used to ensure that the resulting column names are unique otherwise, it will return conflicting metadata for the column ‘name’.

 

Nested JSON with Multiple Lists

This example involves a JSON structure containing multiple lists at different levels.

The challenge is to flatten these lists into a coherent CSV structure, maintaining associations between different levels of data.

Code:

data = [
    {
        "id": 1, 
        "name": "Customer A", 
        "orders": [
            {"order_id": 101, "products": [{"name": "Product 1", "price": 20}, {"name": "Product 2", "price": 30}]},
            {"order_id": 102, "products": [{"name": "Product 3", "price": 15}]}
        ]
    },
    {
        "id": 2, 
        "name": "Customer B", 
        "orders": [
            {"order_id": 201, "products": [{"name": "Product 4", "price": 25}]}
        ]
    }
]
df = pd.json_normalize(data, record_path=['orders', 'products'], 
                       meta=['id', 'name', ['orders', 'order_id']], meta_prefix='customer_')
csv_data = df.to_csv(index=False)
print(csv_data)

Output:

name,price,customer_id,customer_name,customer_orders.order_id
Product 1,20,1,Customer A,101
Product 2,30,1,Customer A,101
Product 3,15,1,Customer A,102
Product 4,25,2,Customer B,201

 

Nested JSON with Conditional Processing

You can extract data based on certain conditions or criteria and export it to CSV.

After flattening the JSON using json_normalize, you can filter the DataFrame:

Code:

data = [
    {
        "id": 1, 
        "name": "Customer A", 
        "details": {"age": 30, "membership": "Premium", "rewards": [{"type": "coupon", "value": 10}, {"type": "points", "value": 100}]}
    },
    {
        "id": 2, 
        "name": "Customer B", 
        "details": {"age": 25, "membership": "Standard", "rewards": [{"type": "points", "value": 50}]}
    }
]
df = pd.json_normalize(data, record_path=['details', 'rewards'], 
                       meta=['id', 'name', ['details', 'age'], ['details', 'membership']])

# Filtering data based on condition
df_filtered = df[df['type'] == 'points']
csv_data = df_filtered.to_csv(index=False)
print(csv_data)

Output:

type,value,id,name,details.age,details.membership
points,100,1,Customer A,30,Premium
points,50,2,Customer B,25,Standard
Leave a Reply

Your email address will not be published. Required fields are marked *