Export XML to Excel using Python Pandas

In this tutorial, you’ll learn how to convert XML files to Excel format using Pandas in Python.

You’ll learn how to parse basic XML into Excel, deal with nested elements, extract attributes, filter specific data, flatten complex hierarchies into tabular data, and more.

 

 

Basic Conversion from XML to Excel

You can export XML files to Excel by reading the XML using Pandas read_xml() and then exporting the result DataFrame to Excel using Pandas to_excel().

XML Sample Content:

<Customers>
  <Customer>
    <ID>1</ID>
    <Name>Customer A</Name>
    <Contact>1234567890</Contact>
  </Customer>
  <Customer>
    <ID>2</ID>
    <Name>Customer B</Name>
    <Contact>2345678901</Contact>
  </Customer>
</Customers>

Code:

import pandas as pd
data = pd.read_xml('customers.xml')
data.to_excel('customers.xlsx', index=False)

 

XML with Nested Elements to Excel

Here, the XML file contains nested elements:

XML Sample Content:

<Customers>
  <Customer>
    <ID>1</ID>
    <Name>Customer A</Name>
    <Subscriptions>
      <Subscription>Plan A</Subscription>
      <Subscription>Plan B</Subscription>
    </Subscriptions>
  </Customer>
  <Customer>
    <ID>2</ID>
    <Name>Customer B</Name>
    <Subscriptions>
      <Subscription>Plan C</Subscription>
    </Subscriptions>
  </Customer>
</Customers>

Code:

import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
data = []
for customer in root:
    id = customer.find('ID').text
    name = customer.find('Name').text
    subscriptions = [s.text for s in customer.findall('Subscriptions/Subscription')]
    data.append({
        'ID': id, 
        'Name': name,
        'Subscriptions': subscriptions
    })
df = pd.DataFrame(data)
df.to_excel('customers.xlsx', index=False)

Here we used ET.parse() to load the XML file into an ElementTree then we get the root Element from the ElementTree using getroot().

After that, we loop through each <customer> element in the root, and inside the loop, we extract data from each <customer> element and append extracted data for each customer to the ‘data’ list as a dict.

 

XML with Attributes to Excel

In this example, the XML file contains customer data where some information is stored as attributes.

XML Sample Content:

<Customers>
  <Customer ID="1" Contact="1234567890">
    <Name>Customer A</Name>
  </Customer>
  <Customer ID="2" Contact="2345678901">
    <Name>Customer B</Name>
  </Customer>
</Customers>

Code:

import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('data.xml')
root = tree.getroot()
ids = []
contacts = []
names = []
for customer in root:
    ids.append(customer.attrib['ID']) 
    contacts.append(customer.attrib['Contact'])
    names.append(customer.find('Name').text)
df = pd.DataFrame({'ID': ids, 'Contact': contacts, 'Name': names})  
df.to_excel('customers.xlsx', index=False)

Here we parse the XML, extract the attribute and text values into separate lists, create a Pandas DataFrame from those lists, and finally write the DataFrame out to an Excel file.

 

Filtering Specific Data from XML

Suppose we only want to export customers who have specific criteria before exporting it to Excel.

XML Sample Content:

<Customers>
  <Customer>
    <ID>1</ID>
    <Name>Customer A</Name>
    <Plan>Gold</Plan>
  </Customer>
  <Customer>
    <ID>2</ID>
    <Name>Customer B</Name>
    <Plan>Silver</Plan>
  </Customer>
  <Customer>
    <ID>3</ID>
    <Name>Customer C</Name>
    <Plan>Gold</Plan>
  </Customer>
</Customers>

Code:

import pandas as pd
data = pd.read_xml('customers_plans.xml')
filtered_data = data[data['Plan'] == 'Gold']
filtered_data.to_excel('gold_plan_customers.xlsx', index=False)
Leave a Reply

Your email address will not be published. Required fields are marked *