How to Convert XML to YAML using Python
In this tutorial, you’ll learn how to convert XML to YAML using Python.
You’ll explore various methods to transform XML to YAML, handling different XML structures and customizing the output
Using xmltodict and PyYAML
You can use the xmltodict
library to parse XML and PyYAML
to write YAML files.
import xmltodict import yaml xml_data = ''' <employees> <employee id="1"> <name>Amina</name> <role>Developer</role> </employee> <employee id="2"> <name>Omar</name> <role>Designer</role> </employee> </employees> ''' # Convert XML to dictionary data_dict = xmltodict.parse(xml_data) # Convert dictionary to YAML yaml_data = yaml.dump(data_dict, sort_keys=False) print(yaml_data)
Output:
employees: employee: - '@id': '1' name: Amina role: Developer - '@id': '2' name: Omar role: Designer
The XML structure is parsed into a Python dictionary and then serialized into YAML format.
Convert XML with Attributes
Handle XML attributes by ensuring they are correctly represented in the YAML output.
import xmltodict import yaml xml_data = ''' <library> <book id="101"> <title>Python Programming</title> <author>Hassan</author> </book> <book id="102"> <title>Data Science Essentials</title> <author>Laila</author> </book> </library> ''' data_dict = xmltodict.parse(xml_data) yaml_data = yaml.dump(data_dict, sort_keys=False) print(yaml_data)
Output:
library: book: - '@id': '101' title: Python Programming author: Hassan - '@id': '102' title: Data Science Essentials author: Laila
Attributes like id
are prefixed with @
in the YAML output to differentiate them from child elements.
Custom Parsing
Using ElementTree
You can use the ElementTree
to manually extract each element for customizing the XML parsing process:
import xml.etree.ElementTree as ET import yaml xml_data = ''' <products> <product> <name>Smartphone</name> <price>699</price> </product> <product> <name>Laptop</name> <price>999</price> </product> </products> ''' root = ET.fromstring(xml_data) products = [] for product in root.findall('product'): prod = { 'name': product.find('name').text, 'price': int(product.find('price').text) } products.append(prod) yaml_data = yaml.dump({'products': products}, sort_keys=False) print(yaml_data)
Output:
products: - name: Smartphone price: 699 - name: Laptop price: 999
Using lxml
You can use the lxml
library for advanced XML parsing capabilities.
from lxml import etree import yaml xml_data = ''' <company> <employee> <name>Yasmine</name> <department>HR</department> </employee> <employee> <name>Karim</name> <department>Engineering</department> </employee> </company> ''' root = etree.fromstring(xml_data) employees = [] for emp in root.findall('employee'): employee = { 'name': emp.findtext('name'), 'department': emp.findtext('department') } employees.append(employee) yaml_data = yaml.dump({'employees': employees}, sort_keys=False) print(yaml_data)
Output:
employees: - name: Yasmine department: HR - name: Karim department: Engineering
Custom Key Naming
You can rename the keys by iterating over them and modifying the key:
import xmltodict import yaml xml_data = ''' <inventory> <item id="201"> <productName>Tablet</productName> <quantity>50</quantity> </item> <item id="202"> <productName>Headphones</productName> <quantity>150</quantity> </item> </inventory> ''' data_dict = xmltodict.parse(xml_data) # Rename keys items = [] for item in data_dict['inventory']['item']: items.append({ 'product_id': item['@id'], 'name': item['productName'], 'stock': int(item['quantity']) }) yaml_data = yaml.dump({'inventory': items}, sort_keys=False) print(yaml_data)
Output:
inventory: - product_id: '201' name: Tablet stock: 50 - product_id: '202' name: Headphones stock: 150
Convert Specific Elements
You can convert specific elements from XML by specifying the elements during the iteration process:
import xmltodict import yaml xml_data = ''' <university> <student> <name>Salma</name> <major>Biology</major> <gpa>3.8</gpa> </student> <student> <name>Tarek</name> <major>Mathematics</major> <gpa>3.9</gpa> </student> </university> ''' data_dict = xmltodict.parse(xml_data) # Extract only names and majors students = [] for student in data_dict['university']['student']: students.append({ 'name': student['name'], 'major': student['major'] }) yaml_data = yaml.dump({'students': students}, sort_keys=False) print(yaml_data)
Output:
students: - name: Salma major: Biology - name: Tarek major: Mathematics
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.