Export XML to Excel using Python Pandas
In this tutorial, you’ll learn how to convert XML files to Excel format using Pandas in Python.
You’ll learn how to parse basic XML into Excel, deal with nested elements, extract attributes, filter specific data, flatten complex hierarchies into tabular data, and more.
Basic Conversion from XML to Excel
You can export XML files to Excel by reading the XML using Pandas read_xml()
and then exporting the result DataFrame to Excel using Pandas to_excel()
.
XML Sample Content:
<Customers> <Customer> <ID>1</ID> <Name>Customer A</Name> <Contact>1234567890</Contact> </Customer> <Customer> <ID>2</ID> <Name>Customer B</Name> <Contact>2345678901</Contact> </Customer> </Customers>
Code:
import pandas as pd data = pd.read_xml('customers.xml') data.to_excel('customers.xlsx', index=False)
XML with Nested Elements to Excel
Here, the XML file contains nested elements:
XML Sample Content:
<Customers> <Customer> <ID>1</ID> <Name>Customer A</Name> <Subscriptions> <Subscription>Plan A</Subscription> <Subscription>Plan B</Subscription> </Subscriptions> </Customer> <Customer> <ID>2</ID> <Name>Customer B</Name> <Subscriptions> <Subscription>Plan C</Subscription> </Subscriptions> </Customer> </Customers>
Code:
import pandas as pd import xml.etree.ElementTree as ET tree = ET.parse('data.xml') root = tree.getroot() data = [] for customer in root: id = customer.find('ID').text name = customer.find('Name').text subscriptions = [s.text for s in customer.findall('Subscriptions/Subscription')] data.append({ 'ID': id, 'Name': name, 'Subscriptions': subscriptions }) df = pd.DataFrame(data) df.to_excel('customers.xlsx', index=False)
Here we used ET.parse()
to load the XML file into an ElementTree then we get the root Element from the ElementTree using getroot()
.
After that, we loop through each <customer> element in the root, and inside the loop, we extract data from each <customer> element and append extracted data for each customer to the ‘data’ list as a dict.
XML with Attributes to Excel
In this example, the XML file contains customer data where some information is stored as attributes.
XML Sample Content:
<Customers> <Customer ID="1" Contact="1234567890"> <Name>Customer A</Name> </Customer> <Customer ID="2" Contact="2345678901"> <Name>Customer B</Name> </Customer> </Customers>
Code:
import xml.etree.ElementTree as ET import pandas as pd tree = ET.parse('data.xml') root = tree.getroot() ids = [] contacts = [] names = [] for customer in root: ids.append(customer.attrib['ID']) contacts.append(customer.attrib['Contact']) names.append(customer.find('Name').text) df = pd.DataFrame({'ID': ids, 'Contact': contacts, 'Name': names}) df.to_excel('customers.xlsx', index=False)
Here we parse the XML, extract the attribute and text values into separate lists, create a Pandas DataFrame from those lists, and finally write the DataFrame out to an Excel file.
Filtering Specific Data from XML
Suppose we only want to export customers who have specific criteria before exporting it to Excel.
XML Sample Content:
<Customers> <Customer> <ID>1</ID> <Name>Customer A</Name> <Plan>Gold</Plan> </Customer> <Customer> <ID>2</ID> <Name>Customer B</Name> <Plan>Silver</Plan> </Customer> <Customer> <ID>3</ID> <Name>Customer C</Name> <Plan>Gold</Plan> </Customer> </Customers>
Code:
import pandas as pd data = pd.read_xml('customers_plans.xml') filtered_data = data[data['Plan'] == 'Gold'] filtered_data.to_excel('gold_plan_customers.xlsx', index=False)
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.