Convert Nested JSON to CSV using Python Pandas
In this tutorial, I’ll cover several examples that illustrate how to convert nested JSON to CSV using Pandas in Python.
These examples will cover a range of scenarios from basic to complex nested structures, dealing with arrays, handling missing data, merging data, and more.
Basic Nested JSON
In this example, a simple nested JSON structure is converted into a CSV format.
Pandas json_normalize
function is used to flatten the nested JSON and then it’s converted to CSV.
Code:
import pandas as pd from io import StringIO data = [ {"id": 1, "name": "Customer A", "contact": {"email": "a@example.com", "phone": "12345"}}, {"id": 2, "name": "Customer B", "contact": {"email": "b@example.com", "phone": "67890"}} ] df = pd.json_normalize(data) csv_data = df.to_csv(index=False) print(csv_data)
Output:
id,name,contact.email,contact.phone 1,Customer A,a@example.com,12345 2,Customer B,b@example.com,67890
Reading Nested JSON from a File
In this example, we’ll read nested JSON data from a file using Pandas read_json
method and then convert it to CSV format.
Sample JSON File Content (data.json
):
[ { "id": 1, "name": "Customer A", "orders": [ {"order_id": 101, "amount": 150}, {"order_id": 102, "amount": 200} ] }, { "id": 2, "name": "Customer B", "orders": [ {"order_id": 201, "amount": 250} ] } ]
Code:
import pandas as pd file_path = 'data.json' df = pd.read_json(file_path) normalized_df = pd.json_normalize(df.to_dict(orient='records'), record_path='orders', meta=['id', 'name']) csv_data = normalized_df.to_csv(index=False) print(csv_data)
Output:
order_id,amount,id,name 101,150,1,Customer A 102,200,1,Customer A 201,250,2,Customer B
Deeply Nested JSON
You can work with more deeply nested JSON data using record_path
and meta
parameters in json_normalize
.
They are used to specify the path to the nested elements and additional metadata to include.
Code:
data = [ {"id": 1, "name": "Customer A", "history": {"orders": [{"order_id": 101, "amount": 150}, {"order_id": 102, "amount": 200}]}}, {"id": 2, "name": "Customer B", "history": {"orders": [{"order_id": 201, "amount": 250}]}} ] df = pd.json_normalize(data, record_path=["history", "orders"], meta=["id", "name"]) csv_data = df.to_csv(index=False) print(csv_data)
Output:
order_id,amount,id,name 101,150,1,Customer A 102,200,1,Customer A 201,250,2,Customer B
Flattening JSON with Nested Objects and Lists
This example shows how to handle a JSON structure with both nested objects and lists, where the goal is to create a CSV that includes details from both the nested objects and the list items.
Code:
import pandas as pd data = [ { "id": 1, "name": "Customer A", "contact": {"email": "a@example.com", "phone": "12345"}, "products": [{"name": "Product 1", "quantity": 2}, {"name": "Product 2", "quantity": 3}] }, { "id": 2, "name": "Customer B", "contact": {"email": "b@example.com", "phone": "67890"}, "products": [{"name": "Product 3", "quantity": 1}] } ] df = pd.json_normalize( data, 'products', ['id', 'name', ['contact', 'email'], ['contact', 'phone']], meta_prefix='customer_' ) csv_data = df.to_csv(index=False) print(csv_data)
Output:
name,quantity,customer_id,customer_name,customer_contact.email,customer_contact.phone Product 1,2,1,Customer A,a@example.com,12345 Product 2,3,1,Customer A,a@example.com,12345 Product 3,1,2,Customer B,b@example.com,67890
Here we used meta_prefix
is used to ensure that the resulting column names are unique otherwise, it will return conflicting metadata for the column ‘name’.
Nested JSON with Multiple Lists
This example involves a JSON structure containing multiple lists at different levels.
The challenge is to flatten these lists into a coherent CSV structure, maintaining associations between different levels of data.
Code:
data = [ { "id": 1, "name": "Customer A", "orders": [ {"order_id": 101, "products": [{"name": "Product 1", "price": 20}, {"name": "Product 2", "price": 30}]}, {"order_id": 102, "products": [{"name": "Product 3", "price": 15}]} ] }, { "id": 2, "name": "Customer B", "orders": [ {"order_id": 201, "products": [{"name": "Product 4", "price": 25}]} ] } ] df = pd.json_normalize(data, record_path=['orders', 'products'], meta=['id', 'name', ['orders', 'order_id']], meta_prefix='customer_') csv_data = df.to_csv(index=False) print(csv_data)
Output:
name,price,customer_id,customer_name,customer_orders.order_id Product 1,20,1,Customer A,101 Product 2,30,1,Customer A,101 Product 3,15,1,Customer A,102 Product 4,25,2,Customer B,201
Nested JSON with Conditional Processing
You can extract data based on certain conditions or criteria and export it to CSV.
After flattening the JSON using json_normalize
, you can filter the DataFrame:
Code:
data = [ { "id": 1, "name": "Customer A", "details": {"age": 30, "membership": "Premium", "rewards": [{"type": "coupon", "value": 10}, {"type": "points", "value": 100}]} }, { "id": 2, "name": "Customer B", "details": {"age": 25, "membership": "Standard", "rewards": [{"type": "points", "value": 50}]} } ] df = pd.json_normalize(data, record_path=['details', 'rewards'], meta=['id', 'name', ['details', 'age'], ['details', 'membership']]) # Filtering data based on condition df_filtered = df[df['type'] == 'points'] csv_data = df_filtered.to_csv(index=False) print(csv_data)
Output:
type,value,id,name,details.age,details.membership points,100,1,Customer A,30,Premium points,50,2,Customer B,25,Standard
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.