Export XML to CSV using Python Pandas

In this tutorial, you’ll learn how to export XML to CSV using Pandas in Python.

The following examples will provide different scenarios where XML data varying in complexity from basic structures to deeply nested and attribute-rich formats, are transformed into CSV files using Pandas.

We’ll use Pandas read_xml() to read the XML content and Pandas to_csv() to export the data to CSV format.

Table of Contents hide

1 Basic XML to CSV
2 Nested XML Elements
3 XML with Attributes
4 XML with Mixed Content
5 XML with Complex Hierarchies
6 Exporting XML with Namespace to CSV
7 Exporting Specific XML Elements to CSV

Basic XML to CSV

This example deals with a straightforward XML file where data is organized in a simple hierarchical structure.

Sample XML Content:

<records>
  <record>
    <id>1</id>
    <name>John Doe</name>
    <email>john@example.com</email>
  </record>
  <record>
    <id>2</id>
    <name>Jane Doe</name>
    <email>jane@example.com</email>
  </record>
</records>

Code:

import pandas as pd
df = pd.read_xml('sample_data.xml')
df.to_csv('output.csv', index=False)

Output:

id,name,email
1,John Doe,john@example.com
2,Jane Doe,jane@example.com

Nested XML Elements

You can use the xml.etree.ElementTree module to parse the XML file.

It iterates over each node, extracts the relevant data, and stores it in a dictionary. Finally, the dictionary is converted into a DataFrame and exported to a CSV file.

Sample XML Content:

<employees>
  <employee>
    <id>101</id>
    <name>Emily</name>
    <contact>
      <email>emily@example.com</email>
      <phone>1234567890</phone>
    </contact>
  </employee>
  <employee>
    <id>102</id>
    <name>James</name>
    <contact>
      <email>james@example.com</email>
      <phone>0987654321</phone>
    </contact>
  </employee>
</employees>

Code:

import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()

# Extract the data from XML and flatten it into a dictionary
data = []
for employee in root.findall('employee'):
    employee_data = {
        'id': employee.find('id').text,
        'name': employee.find('name').text,
        'email': employee.find('contact/email').text,
        'phone': employee.find('contact/phone').text
    }
    data.append(employee_data)
df = pd.DataFrame.from_records(data)
df.to_csv('nested_output.csv', index=False)

Output:

id,name,email,phone
101,Emily,emily@example.com,1234567890
102,James,james@example.com,0987654321

XML with Attributes

This example focuses on XML files where elements have attributes.

Sample XML Content:

<products>
  <product id="p001">
    <name>Widget</name>
    <price>19.99</price>
  </product>
  <product id="p002">
    <name>Gadget</name>
    <price>29.99</price>
  </product>
</products>

Code:

import pandas as pd
df = pd.read_xml('input.xml', xpath='/products/product')
df.to_csv('output.csv', index=False)

Output:

id,name,price
p001,Widget,19.99
p002,Gadget,29.99

XML with Mixed Content

This example covers an XML file with a mix of attributes and text content within elements.

Sample XML Content:

<inventory>
  <item type="electronic">
    <name>Laptop</name>
    <quantity>10</quantity>
    <price>1200</price>
  </item>
  <item type="furniture">
    <name>Chair</name>
    <quantity>50</quantity>
    <price>100</price>
  </item>
</inventory>

Code:

import pandas as pd
df = pd.read_xml('data.xml', xpath='//item')
df.to_csv('output.csv', index=False)

Output:

type,name,quantity,price
electronic,Laptop,10,1200
furniture,Chair,50,100

XML with Complex Hierarchies

This example demonstrates handling an XML file with complex hierarchies and multiple levels of nested elements.

Sample XML Content:

<hospital>
  <ward name="Cardiology">
    <patient>
      <name>Edward</name>
      <diagnosis>Arrhythmia</diagnosis>
      <treatment>Medication</treatment>
    </patient>
    <patient>
      <name>Nancy</name>
      <diagnosis>Myocarditis</diagnosis>
      <treatment>Rest</treatment>
    </patient>
  </ward>
  <ward name="Neurology">
    <patient>
      <name>Oliver</name>
      <diagnosis>Migraine</diagnosis>
      <treatment>Medication</treatment>
    </patient>
  </ward>
</hospital>

Code:

import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
ward_names = []
patient_names = []
diagnoses = []
treatments = []
for ward in root.findall('ward'):
    ward_name = ward.get('name')
    for patient in ward.findall('patient'):
        ward_names.append(ward_name)
        patient_names.append(patient.find('name').text)
        diagnoses.append(patient.find('diagnosis').text)
        treatments.append(patient.find('treatment').text)
data = {'ward_name': ward_names, 'patient_name': patient_names, 'diagnosis': diagnoses, 'treatment': treatments}
df = pd.DataFrame(data)
df.to_csv('output.csv', index=False)

Output:

ward_name,patient_name,diagnosis,treatment
Cardiology,Edward,Arrhythmia,Medication
Cardiology,Nancy,Myocarditis,Rest
Neurology,Oliver,Migraine,Medication

Exporting XML with Namespace to CSV

Handling XML files with namespaces can be a bit tricky, as they add another layer of complexity.

This example demonstrates how to process an XML file with namespaces and export it to a CSV format using Pandas.

Sample XML Content:

<ns:company xmlns:ns="http://www.example.com/ns">
  <ns:employee>
    <ns:id>001</ns:id>
    <ns:name>Lisa</ns:name>
    <ns:role>Developer</ns:role>
  </ns:employee>
  <ns:employee>
    <ns:id>002</ns:id>
    <ns:name>Tom</ns:name>
    <ns:role>Designer</ns:role>
  </ns:employee>
</ns:company>

Code:

import pandas as pd
namespaces = {
    'ns': 'http://www.example.com/ns'
}
df = pd.read_xml('namespace_data.xml', xpath='//ns:employee', namespaces=namespaces)
df.to_csv('namespace_output.csv', index=False)

Output:

id,name,role
1,Lisa,Developer
2,Tom,Designer

Exporting Specific XML Elements to CSV

Sometimes, you may only need to export specific elements from an XML file to a CSV, instead of the entire dataset.

Sample XML Content:

<catalog>
  <product>
    <name>Widget A</name>
    <category>Tools</category>
    <price>19.99</price>
  </product>
  <product>
    <name>Gadget B</name>
    <category>Electronics</category>
    <price>29.99</price>
  </product>
  <product>
    <name>Device C</name>
    <category>Gadgets</category>
    <price>39.99</price>
  </product>
</catalog>

Code:

import pandas as pd
df = pd.read_xml('catalog_data.xml', xpath='//product')
selected_columns = df[['name', 'price']]
selected_columns.to_csv('selected_elements_output.csv', index=False)

Here we specified the columns we want to export “name” and “price”.

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Basic XML to CSV

Nested XML Elements

XML with Attributes

XML with Mixed Content

XML with Complex Hierarchies

Exporting XML with Namespace to CSV

Exporting Specific XML Elements to CSV

Related posts

Leave a Reply Cancel reply