Convert CSV to JSON using Python Pandas (Easy Tutorial)
In this tutorial, We’ll learn how to convert CSV files to JSON, including reading CSV files, converting the entire DataFrame as well as specific columns or rows to JSON.
We’ll use Pandas read_csv()
and Pandas to_json()
to do this.
Reading CSV File
To start converting a CSV file to JSON, the first step is to read the CSV file using Pandas.
Pandas read_csv()
function loads data from a CSV file into a DataFrame.
Here’s how you do it:
import pandas as pd csv_file = 'telecom_data.csv' df = pd.read_csv(csv_file) print(df.head())
Output:
User_ID Call_Duration Data_Usage_GB Plan_Type 0 10001 300 2.5 Basic 1 10002 450 4.0 Standard 2 10003 200 1.0 Basic 3 10004 350 3.0 Standard 4 10005 500 5.0 Premium
Here we read the CSV file into a DataFrame and display the first five rows using the head()
method.
Converting DataFrame to JSON
Once the CSV file is loaded into a DataFrame, you can convert it to JSON using Pandas to_json()
method.
You can also customize the format of the JSON output based on your requirements.
Here’s how to convert the DataFrame to JSON:
json_data = df.to_json(orient='records', lines=True) print(json_data)
Output:
{"User_ID":10001,"Call_Duration":300,"Data_Usage_GB":2.5,"Plan_Type":"Basic"} {"User_ID":10002,"Call_Duration":450,"Data_Usage_GB":4.0,"Plan_Type":"Standard"} {"User_ID":10003,"Call_Duration":200,"Data_Usage_GB":1.0,"Plan_Type":"Basic"} ...
The orient='records'
parameter structures the JSON data as a list of records, and lines=True
formats the output with each record on a separate line.
Converting Specific Columns of the CSV Data
You can select specific columns and convert them to JSON by specifying the columns:
selected_columns = df[['User_ID', 'Plan_Type']] selected_json = selected_columns.to_json(orient='records', lines=True) print(selected_json)
Output:
{"User_ID":10001,"Plan_Type":"Basic"} {"User_ID":10002,"Plan_Type":"Standard"} {"User_ID":10003,"Plan_Type":"Basic"} ...
Convert Specific Rows
You can convert only specific rows from your CSV data to JSON by filtering the rows based on the conditions you want:
# Selecting specific rows based on a condition specific_rows = df[df['Call_Duration'] > 400] specific_rows_json = specific_rows.to_json(orient='records', lines=True) print(specific_rows_json)
Output:
{"User_ID":10002,"Call_Duration":450,"Data_Usage_GB":4.0,"Plan_Type":"Standard"} {"User_ID":10005,"Call_Duration":500,"Data_Usage_GB":5.0,"Plan_Type":"Premium"} ...
In this example, the DataFrame df
is filtered to include only rows where the ‘Call_Duration’ is greater than 400 minutes.
The filtered DataFrame specific_rows
is then converted to a JSON string.
Convert CSV to Nested JSON
Converting CSV data to a nested JSON format is useful when dealing with complex data structures or when you need to maintain hierarchical relationships in your data.
Nested JSON structures are common in various applications, especially those involving web APIs or data storage solutions. Pandas can be used to create these nested JSON structures from a CSV file.
Here’s how you can do it:
# Grouping data by 'Plan_Type' and converting each group to JSON nested_json = df.groupby('Plan_Type').apply(lambda x: x.to_dict(orient='records')).to_json() print(nested_json)
Output:
{ "Basic": [{"User_ID":10001,"Call_Duration":300,"Data_Usage_GB":2.5}, {"User_ID":10003,"Call_Duration":200,"Data_Usage_GB":1.0}], "Standard": [{"User_ID":10002,"Call_Duration":450,"Data_Usage_GB":4.0}, {"User_ID":10004,"Call_Duration":350,"Data_Usage_GB":3.0}], "Premium": [{"User_ID":10005,"Call_Duration":500,"Data_Usage_GB":5.0}] }
In this example, the DataFrame df
is grouped by the ‘Plan_Type’ column. For each group, the to_dict(orient='records')
method converts the group into a list of dictionaries.
Finally, the whole structure is converted into a JSON string, resulting in a nested JSON format where each plan type is a key associated with a list of records.
Formatting and Indenting JSON
Python’s JSON module allows you to format and indent your JSON data.
Here’s how to format and indent JSON data:
import json unformatted_json = df.to_json(orient='records') # Load the JSON string into a Python object json_object = json.loads(unformatted_json) # Format and indent the JSON data formatted_json = json.dumps(json_object, indent=4) print(formatted_json)
Output:
[ { "User_ID": 10001, "Call_Duration": 300, "Data_Usage_GB": 2.5, "Plan_Type": "Basic" }, { "User_ID": 10002, "Call_Duration": 450, "Data_Usage_GB": 4.0, "Plan_Type": "Standard" }, ... ]
In this code, the DataFrame df
is first converted to a JSON string. This string is then loaded into a Python object using json.loads()
.
Finally, json.dumps()
is used to convert the Python object back into a formatted JSON string, with an indentation of 4 spaces, making it more readable and structured.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.