Convert CSV to JSON using Python Pandas (Easy Tutorial)

In this tutorial, We’ll learn how to convert CSV files to JSON, including reading CSV files, converting the entire DataFrame as well as specific columns or rows to JSON.

We’ll use Pandas read_csv() and Pandas to_json() to do this.

 

 

Reading CSV File

To start converting a CSV file to JSON, the first step is to read the CSV file using Pandas.

Pandas read_csv() function loads data from a CSV file into a DataFrame.

Here’s how you do it:

import pandas as pd
csv_file = 'telecom_data.csv'
df = pd.read_csv(csv_file)
print(df.head())

Output:

   User_ID  Call_Duration  Data_Usage_GB Plan_Type
0    10001            300           2.5     Basic
1    10002            450           4.0  Standard
2    10003            200           1.0     Basic
3    10004            350           3.0  Standard
4    10005            500           5.0   Premium

Here we read the CSV file into a DataFrame and display the first five rows using the head() method.

 

Converting DataFrame to JSON

Once the CSV file is loaded into a DataFrame, you can convert it to JSON using Pandas to_json() method.

You can also customize the format of the JSON output based on your requirements.

Here’s how to convert the DataFrame to JSON:

json_data = df.to_json(orient='records', lines=True)
print(json_data)

Output:

{"User_ID":10001,"Call_Duration":300,"Data_Usage_GB":2.5,"Plan_Type":"Basic"}
{"User_ID":10002,"Call_Duration":450,"Data_Usage_GB":4.0,"Plan_Type":"Standard"}
{"User_ID":10003,"Call_Duration":200,"Data_Usage_GB":1.0,"Plan_Type":"Basic"}
...

The orient='records' parameter structures the JSON data as a list of records, and lines=True formats the output with each record on a separate line.

 

Converting Specific Columns of the CSV Data

You can select specific columns and convert them to JSON by specifying the columns:

selected_columns = df[['User_ID', 'Plan_Type']]
selected_json = selected_columns.to_json(orient='records', lines=True)
print(selected_json)

Output:

{"User_ID":10001,"Plan_Type":"Basic"}
{"User_ID":10002,"Plan_Type":"Standard"}
{"User_ID":10003,"Plan_Type":"Basic"}
...

 

Convert Specific Rows

You can convert only specific rows from your CSV data to JSON by filtering the rows based on the conditions you want:

# Selecting specific rows based on a condition
specific_rows = df[df['Call_Duration'] > 400]
specific_rows_json = specific_rows.to_json(orient='records', lines=True)
print(specific_rows_json)

Output:

{"User_ID":10002,"Call_Duration":450,"Data_Usage_GB":4.0,"Plan_Type":"Standard"}
{"User_ID":10005,"Call_Duration":500,"Data_Usage_GB":5.0,"Plan_Type":"Premium"}
...

In this example, the DataFrame df is filtered to include only rows where the ‘Call_Duration’ is greater than 400 minutes.

The filtered DataFrame specific_rows is then converted to a JSON string.

 

Convert CSV to Nested JSON

Converting CSV data to a nested JSON format is useful when dealing with complex data structures or when you need to maintain hierarchical relationships in your data.

Nested JSON structures are common in various applications, especially those involving web APIs or data storage solutions. Pandas can be used to create these nested JSON structures from a CSV file.

Here’s how you can do it:

# Grouping data by 'Plan_Type' and converting each group to JSON
nested_json = df.groupby('Plan_Type').apply(lambda x: x.to_dict(orient='records')).to_json()
print(nested_json)

Output:

{
    "Basic": [{"User_ID":10001,"Call_Duration":300,"Data_Usage_GB":2.5},
              {"User_ID":10003,"Call_Duration":200,"Data_Usage_GB":1.0}],
    "Standard": [{"User_ID":10002,"Call_Duration":450,"Data_Usage_GB":4.0},
                 {"User_ID":10004,"Call_Duration":350,"Data_Usage_GB":3.0}],
    "Premium": [{"User_ID":10005,"Call_Duration":500,"Data_Usage_GB":5.0}]
}

In this example, the DataFrame df is grouped by the ‘Plan_Type’ column. For each group, the to_dict(orient='records') method converts the group into a list of dictionaries.

Finally, the whole structure is converted into a JSON string, resulting in a nested JSON format where each plan type is a key associated with a list of records.

 

Formatting and Indenting JSON

Python’s JSON module allows you to format and indent your JSON data.

Here’s how to format and indent JSON data:

import json
unformatted_json = df.to_json(orient='records')

# Load the JSON string into a Python object
json_object = json.loads(unformatted_json)

# Format and indent the JSON data
formatted_json = json.dumps(json_object, indent=4)
print(formatted_json)

Output:

[
    {
        "User_ID": 10001,
        "Call_Duration": 300,
        "Data_Usage_GB": 2.5,
        "Plan_Type": "Basic"
    },
    {
        "User_ID": 10002,
        "Call_Duration": 450,
        "Data_Usage_GB": 4.0,
        "Plan_Type": "Standard"
    },
    ...
]

In this code, the DataFrame df is first converted to a JSON string. This string is then loaded into a Python object using json.loads().

Finally, json.dumps() is used to convert the Python object back into a formatted JSON string, with an indentation of 4 spaces, making it more readable and structured.

Leave a Reply

Your email address will not be published. Required fields are marked *