Export XML to JSON using Python Pandas

In this tutorial, you’ll learn how to use Pandas to export XML to JSON in Python.

From simple to complex, including nested elements, attributes, and more.

 

 

Convert Basic XML to JSON

Here’s the content of our sample XML file:

<customers>
    <customer>
        <id>1</id>
        <name>John Doe</name>
    </customer>
    <customer>
        <id>2</id>
        <name>Jane Smith</name>
    </customer>
</customers>

We use Pandas read_xml() to read this XML and convert it to a DataFrame.

Then, we export this DataFrame to JSON using Pandas to_json().

import pandas as pd
df = pd.read_xml('xmlfile.xml', xpath='//customer')
json_output = df.to_json(orient='records', indent=4)
print(json_output)

Output:

[
    {
        "id": 1,
        "name": "John Doe"
    },
    {
        "id": 2,
        "name": "Jane Smith"
    }
]

 

Handling Nested XML Elements

Consider the following nested XML content:

<customers>
    <customer>
        <id>1</id>
        <name>John Doe</name>
        <address>
            <street>Main St</street>
            <city>Springfield</city>
        </address>
    </customer>
    <customer>
        <id>2</id>
        <name>Jane Smith</name>
        <address>
            <street>Second St</street>
            <city>Shelbyville</city>
        </address>
    </customer>
</customers>

We’ll read this XML, handle the nested elements, and convert it to JSON.

import pandas as pd
import xmltodict
import json
with open("data.xml") as xml_file:
   data_dict = xmltodict.parse(xml_file.read())
df = pd.json_normalize(data_dict['customers']['customer'])
json_output = df.to_json(orient='records', indent=4)
print(json_output)

Output:

[
    {
        "id":"1",
        "name":"John Doe",
        "address.street":"Main St",
        "address.city":"Springfield"
    },
    {
        "id":"2",
        "name":"Jane Smith",
        "address.street":"Second St",
        "address.city":"Shelbyville"
    }
]

 

Transforming XML Attributes to JSON

Now, let’s consider an XML file where some data is stored as attributes. For example, the id of a customer is an attribute of the <customer> element.

Sample XML content:

<customers>
    <customer id="1">
        <name>John Doe</name>
        <plan>Basic</plan>
    </customer>
    <customer id="2">
        <name>Jane Smith</name>
        <plan>Premium</plan>
    </customer>
</customers>

In this case, we need to extract these attributes and include them in our DataFrame before converting to JSON.

import pandas as pd
df = pd.read_xml('data.xml', xpath='//customer')
json_output = df.to_json(orient='records', indent=4)
print(json_output)

Output:

[
    {
        "id": 1,
        "name": "John Doe",
        "plan": "Basic"
    },
    {
        "id": 2,
        "name": "Jane Smith",
        "plan": "Premium"
    }
]

 

XML with Multiple Child Elements

Here, the XML file contains multiple <phone> elements for each customer.

The challenge is to read these multiple child elements and incorporate them into our JSON structure.

Sample XML content:

<customers>
    <customer>
        <id>1</id>
        <name>John Doe</name>
        <phones>
            <phone>1234567890</phone>
            <phone>2345678901</phone>
        </phones>
    </customer>
    <customer>
        <id>2</id>
        <name>Jane Smith</name>
        <phones>
            <phone>3456789012</phone>
            <phone>4567890123</phone>
        </phones>
    </customer>
</customers>

We need to handle these multiple <phone> elements for each customer.

import pandas as pd
import xmltodict
import json
with open("data.xml") as xml_file:
   data_dict = xmltodict.parse(xml_file.read())
df = pd.json_normalize(data_dict['customers']['customer'])
json_output = df.to_json(orient='records', indent=4)
print(json_output)

Output:

[
    {
        "id":"1",
        "name":"John Doe",
        "phones.phone":[
            "1234567890",
            "2345678901"
        ]
    },
    {
        "id":"2",
        "name":"Jane Smith",
        "phones.phone":[
            "3456789012",
            "4567890123"
        ]
    }
]
Leave a Reply

Your email address will not be published. Required fields are marked *