Convert CSV to Nested JSON using Python Pandas
In this tutorial, we’ll learn how to convert a CSV file to nested JSON format using Pandas in Python.
We’ll cover different cases, from basic flat structure conversion to more advanced techniques including multi-level nesting, conditional nesting, and creating nested JSON with aggregated data.
Basic Conversion
You can convert a flat CSV structure into a basic JSON format using Pandas to_json
:
Code:
import pandas as pd data = {'ID': [1, 2], 'Name': ['John', 'Jane'], 'Age': [28, 34]} df = pd.DataFrame(data) nested_json = df.to_json(orient='records') print(nested_json)
Output:
[ {"ID": 1, "Name": "John", "Age": 28}, {"ID": 2, "Name": "Jane", "Age": 34} ]
Grouping by a Column
You can group data by a specific column and convert it into a nested JSON structure:
import pandas as pd data = {'ID': [1, 1, 2, 2], 'Product': ['Phone', 'Laptop', 'Phone', 'Tablet']} df = pd.DataFrame(data) # Group by 'ID' and form nested JSON grouped_json = df.groupby('ID').apply(lambda x: x.to_dict(orient='records')).to_json() print(grouped_json)
Output:
{ "1": [{"ID": 1, "Product": "Phone"}, {"ID": 1, "Product": "Laptop"}], "2": [{"ID": 2, "Product": "Phone"}, {"ID": 2, "Product": "Tablet"}] }
Custom Nested Structure
You can create a more complex nested JSON structure with custom formatting:
import pandas as pd data = {'ID': [1, 2], 'Name': ['John', 'Jane'], 'Age': [28, 34], 'Country': ['USA', 'Canada']} df = pd.DataFrame(data) # Define a custom function for nesting def custom_nesting(row): return {'PersonalInfo': {'Name': row['Name'], 'Age': row['Age']}, 'Location': {'Country': row['Country']}} nested_json = df.apply(custom_nesting, axis=1).to_json(orient='records') print(nested_json)
Output:
[ {"PersonalInfo": {"Name": "John", "Age": 28}, "Location": {"Country": "USA"}}, {"PersonalInfo": {"Name": "Jane", "Age": 34}, "Location": {"Country": "Canada"}} ]
Nested Lists within JSON
You can create a JSON structure where certain columns in CSV are converted into nested lists within each JSON object.
Code:
import pandas as pd data = {'ID': [1, 1, 2, 2], 'Feature': ['Color', 'Size', 'Color', 'Size'], 'Value': ['Red', 'Large', 'Blue', 'Medium']} df = pd.DataFrame(data) # Reshape data and create nested lists df_grouped = df.groupby('ID').apply(lambda x: x[['Feature', 'Value']].to_dict(orient='records')).reset_index(name='Attributes') nested_json = df_grouped.to_json(orient='records') print(nested_json)
Output:
[ {"ID": 1, "Attributes": [{"Feature": "Color", "Value": "Red"}, {"Feature": "Size", "Value": "Large"}]}, {"ID": 2, "Attributes": [{"Feature": "Color", "Value": "Blue"}, {"Feature": "Size", "Value": "Medium"}]} ]
Hierarchical Nesting
You can convert CSV data into a hierarchical nested JSON, useful for representing tree-like structures:
import pandas as pd data = {'Category': ['Electronics', 'Electronics', 'Clothing', 'Clothing'], 'Subcategory': ['Phone', 'Laptop', 'Shirt', 'Pants'], 'Product': ['iPhone', 'Dell Laptop', 'Cotton Shirt', 'Jeans']} df = pd.DataFrame(data) # Create hierarchical nested JSON df_grouped = df.groupby(['Category', 'Subcategory']).apply(lambda x: x['Product'].tolist()).reset_index(name='Products') nested_json = df_grouped.to_json(orient='records') print(nested_json)
Output:
[ {"Category": "Electronics", "Subcategory": "Laptop", "Products": ["Dell Laptop"]}, {"Category": "Electronics", "Subcategory": "Phone", "Products": ["iPhone"]}, {"Category": "Clothing", "Subcategory": "Pants", "Products": ["Jeans"]}, {"Category": "Clothing", "Subcategory": "Shirt", "Products": ["Cotton Shirt"]} ]
Merging Columns into a Nested JSON Object
You can merge multiple columns into a single nested JSON object within each record:
import pandas as pd data = {'ID': [1, 2], 'FirstName': ['John', 'Jane'], 'LastName': ['Doe', 'Smith'], 'Age': [28, 34]} df = pd.DataFrame(data) # Merge columns into a nested JSON object df['Name'] = df.apply(lambda x: {'First': x['FirstName'], 'Last': x['LastName']}, axis=1) nested_json = df[['ID', 'Name', 'Age']].to_json(orient='records') print(nested_json)
Output:
[ {"ID": 1, "Name": {"First": "John", "Last": "Doe"}, "Age": 28}, {"ID": 2, "Name": {"First": "Jane", "Last": "Smith"}, "Age": 34} ]
Adding a Nested Key-Value Pair Based on Condition
You can add a nested key-value pair in the JSON output based on a condition applied to one of the columns:
import pandas as pd data = {'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie'], 'Score': [85, 92, 78]} df = pd.DataFrame(data) # Add a nested 'Status' field based on 'Score' df['Status'] = df['Score'].apply(lambda x: {'Pass': True, 'Grade': 'A' if x >= 90 else 'B'} if x >= 80 else {'Pass': False}) nested_json = df.to_json(orient='records') print(nested_json)
Output:
[ {"ID": 1, "Name": "Alice", "Score": 85, "Status": {"Pass": true, "Grade": "B"}}, {"ID": 2, "Name": "Bob", "Score": 92, "Status": {"Pass": true, "Grade": "A"}}, {"ID": 3, "Name": "Charlie", "Score": 78, "Status": {"Pass": false}} ]
Nested JSON with Aggregated Data
You can create a nested JSON structure that includes aggregated data from the CSV:
import pandas as pd data = {'Department': ['HR', 'HR', 'IT', 'IT'], 'Employee': ['Alice', 'Bob', 'Charlie', 'Dave'], 'Salary': [70000, 80000, 90000, 85000]} df = pd.DataFrame(data) agg_data = df.groupby('Department').agg({'Employee': lambda x: list(x), 'Salary': 'mean'}).reset_index() nested_json = agg_data.to_json(orient='records') print(nested_json)
Output:
[ {"Department": "HR", "Employee": ["Alice", "Bob"], "Salary": 75000.0}, {"Department": "IT", "Employee": ["Charlie", "Dave"], "Salary": 87500.0} ]
Conditional Nesting Based on Multiple Columns
You can apply conditional logic to nest data differently based on multiple column values:
import pandas as pd data = {'ID': [1, 2, 3, 4], 'Type': ['A', 'B', 'A', 'B'], 'Value1': [100, 200, 150, 250], 'Value2': [10, 20, 15, 25]} df = pd.DataFrame(data) def conditional_nesting(row): if row['Type'] == 'A': return {'TypeA': {'Value1': row['Value1'], 'Value2': row['Value2']}} else: return {'TypeB': {'Total': row['Value1'] + row['Value2']}} nested_json = df.apply(conditional_nesting, axis=1).to_json(orient='records') print(nested_json)
Output:
[ {"TypeA": {"Value1": 100, "Value2": 10}}, {"TypeB": {"Total": 220}}, {"TypeA": {"Value1": 150, "Value2": 15}}, {"TypeB": {"Total": 275}} ]
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.