Export XML to JSON using Python Pandas
In this tutorial, you’ll learn how to use Pandas to export XML to JSON in Python.
From simple to complex, including nested elements, attributes, and more.
Convert Basic XML to JSON
Here’s the content of our sample XML file:
<customers> <customer> <id>1</id> <name>John Doe</name> </customer> <customer> <id>2</id> <name>Jane Smith</name> </customer> </customers>
We use Pandas read_xml()
to read this XML and convert it to a DataFrame.
Then, we export this DataFrame to JSON using Pandas to_json()
.
import pandas as pd df = pd.read_xml('xmlfile.xml', xpath='//customer') json_output = df.to_json(orient='records', indent=4) print(json_output)
Output:
[ { "id": 1, "name": "John Doe" }, { "id": 2, "name": "Jane Smith" } ]
Handling Nested XML Elements
Consider the following nested XML content:
<customers> <customer> <id>1</id> <name>John Doe</name> <address> <street>Main St</street> <city>Springfield</city> </address> </customer> <customer> <id>2</id> <name>Jane Smith</name> <address> <street>Second St</street> <city>Shelbyville</city> </address> </customer> </customers>
We’ll read this XML, handle the nested elements, and convert it to JSON.
import pandas as pd import xmltodict import json with open("data.xml") as xml_file: data_dict = xmltodict.parse(xml_file.read()) df = pd.json_normalize(data_dict['customers']['customer']) json_output = df.to_json(orient='records', indent=4) print(json_output)
Output:
[ { "id":"1", "name":"John Doe", "address.street":"Main St", "address.city":"Springfield" }, { "id":"2", "name":"Jane Smith", "address.street":"Second St", "address.city":"Shelbyville" } ]
Transforming XML Attributes to JSON
Now, let’s consider an XML file where some data is stored as attributes. For example, the id
of a customer is an attribute of the <customer>
element.
Sample XML content:
<customers> <customer id="1"> <name>John Doe</name> <plan>Basic</plan> </customer> <customer id="2"> <name>Jane Smith</name> <plan>Premium</plan> </customer> </customers>
In this case, we need to extract these attributes and include them in our DataFrame before converting to JSON.
import pandas as pd df = pd.read_xml('data.xml', xpath='//customer') json_output = df.to_json(orient='records', indent=4) print(json_output)
Output:
[ { "id": 1, "name": "John Doe", "plan": "Basic" }, { "id": 2, "name": "Jane Smith", "plan": "Premium" } ]
XML with Multiple Child Elements
Here, the XML file contains multiple <phone>
elements for each customer.
The challenge is to read these multiple child elements and incorporate them into our JSON structure.
Sample XML content:
<customers> <customer> <id>1</id> <name>John Doe</name> <phones> <phone>1234567890</phone> <phone>2345678901</phone> </phones> </customer> <customer> <id>2</id> <name>Jane Smith</name> <phones> <phone>3456789012</phone> <phone>4567890123</phone> </phones> </customer> </customers>
We need to handle these multiple <phone>
elements for each customer.
import pandas as pd import xmltodict import json with open("data.xml") as xml_file: data_dict = xmltodict.parse(xml_file.read()) df = pd.json_normalize(data_dict['customers']['customer']) json_output = df.to_json(orient='records', indent=4) print(json_output)
Output:
[ { "id":"1", "name":"John Doe", "phones.phone":[ "1234567890", "2345678901" ] }, { "id":"2", "name":"Jane Smith", "phones.phone":[ "3456789012", "4567890123" ] } ]
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.