Export XML to CSV using Python Pandas
In this tutorial, you’ll learn how to export XML to CSV using Pandas in Python.
The following examples will provide different scenarios where XML data varying in complexity from basic structures to deeply nested and attribute-rich formats, are transformed into CSV files using Pandas.
We’ll use Pandas read_xml()
to read the XML content and Pandas to_csv()
to export the data to CSV format.
Basic XML to CSV
This example deals with a straightforward XML file where data is organized in a simple hierarchical structure.
Sample XML Content:
<records> <record> <id>1</id> <name>John Doe</name> <email>john@example.com</email> </record> <record> <id>2</id> <name>Jane Doe</name> <email>jane@example.com</email> </record> </records>
Code:
import pandas as pd df = pd.read_xml('sample_data.xml') df.to_csv('output.csv', index=False)
Output:
id,name,email 1,John Doe,john@example.com 2,Jane Doe,jane@example.com
Nested XML Elements
You can use the xml.etree.ElementTree
module to parse the XML file.
It iterates over each node, extracts the relevant data, and stores it in a dictionary. Finally, the dictionary is converted into a DataFrame and exported to a CSV file.
Sample XML Content:
<employees> <employee> <id>101</id> <name>Emily</name> <contact> <email>emily@example.com</email> <phone>1234567890</phone> </contact> </employee> <employee> <id>102</id> <name>James</name> <contact> <email>james@example.com</email> <phone>0987654321</phone> </contact> </employee> </employees>
Code:
import pandas as pd import xml.etree.ElementTree as ET tree = ET.parse('data.xml') root = tree.getroot() # Extract the data from XML and flatten it into a dictionary data = [] for employee in root.findall('employee'): employee_data = { 'id': employee.find('id').text, 'name': employee.find('name').text, 'email': employee.find('contact/email').text, 'phone': employee.find('contact/phone').text } data.append(employee_data) df = pd.DataFrame.from_records(data) df.to_csv('nested_output.csv', index=False)
Output:
id,name,email,phone 101,Emily,emily@example.com,1234567890 102,James,james@example.com,0987654321
XML with Attributes
This example focuses on XML files where elements have attributes.
Sample XML Content:
<products> <product id="p001"> <name>Widget</name> <price>19.99</price> </product> <product id="p002"> <name>Gadget</name> <price>29.99</price> </product> </products>
Code:
import pandas as pd df = pd.read_xml('input.xml', xpath='/products/product') df.to_csv('output.csv', index=False)
Output:
id,name,price p001,Widget,19.99 p002,Gadget,29.99
XML with Mixed Content
This example covers an XML file with a mix of attributes and text content within elements.
Sample XML Content:
<inventory> <item type="electronic"> <name>Laptop</name> <quantity>10</quantity> <price>1200</price> </item> <item type="furniture"> <name>Chair</name> <quantity>50</quantity> <price>100</price> </item> </inventory>
Code:
import pandas as pd df = pd.read_xml('data.xml', xpath='//item') df.to_csv('output.csv', index=False)
Output:
type,name,quantity,price electronic,Laptop,10,1200 furniture,Chair,50,100
XML with Complex Hierarchies
This example demonstrates handling an XML file with complex hierarchies and multiple levels of nested elements.
Sample XML Content:
<hospital> <ward name="Cardiology"> <patient> <name>Edward</name> <diagnosis>Arrhythmia</diagnosis> <treatment>Medication</treatment> </patient> <patient> <name>Nancy</name> <diagnosis>Myocarditis</diagnosis> <treatment>Rest</treatment> </patient> </ward> <ward name="Neurology"> <patient> <name>Oliver</name> <diagnosis>Migraine</diagnosis> <treatment>Medication</treatment> </patient> </ward> </hospital>
Code:
import pandas as pd import xml.etree.ElementTree as ET tree = ET.parse('data.xml') root = tree.getroot() ward_names = [] patient_names = [] diagnoses = [] treatments = [] for ward in root.findall('ward'): ward_name = ward.get('name') for patient in ward.findall('patient'): ward_names.append(ward_name) patient_names.append(patient.find('name').text) diagnoses.append(patient.find('diagnosis').text) treatments.append(patient.find('treatment').text) data = {'ward_name': ward_names, 'patient_name': patient_names, 'diagnosis': diagnoses, 'treatment': treatments} df = pd.DataFrame(data) df.to_csv('output.csv', index=False)
Output:
ward_name,patient_name,diagnosis,treatment Cardiology,Edward,Arrhythmia,Medication Cardiology,Nancy,Myocarditis,Rest Neurology,Oliver,Migraine,Medication
Exporting XML with Namespace to CSV
Handling XML files with namespaces can be a bit tricky, as they add another layer of complexity.
This example demonstrates how to process an XML file with namespaces and export it to a CSV format using Pandas.
Sample XML Content:
<ns:company xmlns:ns="http://www.example.com/ns"> <ns:employee> <ns:id>001</ns:id> <ns:name>Lisa</ns:name> <ns:role>Developer</ns:role> </ns:employee> <ns:employee> <ns:id>002</ns:id> <ns:name>Tom</ns:name> <ns:role>Designer</ns:role> </ns:employee> </ns:company>
Code:
import pandas as pd namespaces = { 'ns': 'http://www.example.com/ns' } df = pd.read_xml('namespace_data.xml', xpath='//ns:employee', namespaces=namespaces) df.to_csv('namespace_output.csv', index=False)
Output:
id,name,role 1,Lisa,Developer 2,Tom,Designer
Exporting Specific XML Elements to CSV
Sometimes, you may only need to export specific elements from an XML file to a CSV, instead of the entire dataset.
Sample XML Content:
<catalog> <product> <name>Widget A</name> <category>Tools</category> <price>19.99</price> </product> <product> <name>Gadget B</name> <category>Electronics</category> <price>29.99</price> </product> <product> <name>Device C</name> <category>Gadgets</category> <price>39.99</price> </product> </catalog>
Code:
import pandas as pd df = pd.read_xml('catalog_data.xml', xpath='//product') selected_columns = df[['name', 'price']] selected_columns.to_csv('selected_elements_output.csv', index=False)
Here we specified the columns we want to export “name” and “price”.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.