How to Convert XML to YAML using Python

In this tutorial, you’ll learn how to convert XML to YAML using Python.

You’ll explore various methods to transform XML to YAML, handling different XML structures and customizing the output

 

 

Using xmltodict and PyYAML

You can use the xmltodict library to parse XML and PyYAML to write YAML files.

import xmltodict
import yaml
xml_data = '''
<employees>
    <employee id="1">
        <name>Amina</name>
        <role>Developer</role>
    </employee>
    <employee id="2">
        <name>Omar</name>
        <role>Designer</role>
    </employee>
</employees>
'''

# Convert XML to dictionary
data_dict = xmltodict.parse(xml_data)

# Convert dictionary to YAML
yaml_data = yaml.dump(data_dict, sort_keys=False)
print(yaml_data)

Output:

employees:
  employee:
    - '@id': '1'
      name: Amina
      role: Developer
    - '@id': '2'
      name: Omar
      role: Designer

The XML structure is parsed into a Python dictionary and then serialized into YAML format.

 

Convert XML with Attributes

Handle XML attributes by ensuring they are correctly represented in the YAML output.

import xmltodict
import yaml
xml_data = '''
<library>
    <book id="101">
        <title>Python Programming</title>
        <author>Hassan</author>
    </book>
    <book id="102">
        <title>Data Science Essentials</title>
        <author>Laila</author>
    </book>
</library>
'''
data_dict = xmltodict.parse(xml_data)
yaml_data = yaml.dump(data_dict, sort_keys=False)
print(yaml_data)

Output:

library:
  book:
    - '@id': '101'
      title: Python Programming
      author: Hassan
    - '@id': '102'
      title: Data Science Essentials
      author: Laila

Attributes like id are prefixed with @ in the YAML output to differentiate them from child elements.

 

Custom Parsing

Using ElementTree

You can use the ElementTree to manually extract each element for customizing the XML parsing process:

import xml.etree.ElementTree as ET
import yaml
xml_data = '''
<products>
    <product>
        <name>Smartphone</name>
        <price>699</price>
    </product>
    <product>
        <name>Laptop</name>
        <price>999</price>
    </product>
</products>
'''
root = ET.fromstring(xml_data)
products = []
for product in root.findall('product'):
    prod = {
        'name': product.find('name').text,
        'price': int(product.find('price').text)
    }
    products.append(prod)
yaml_data = yaml.dump({'products': products}, sort_keys=False)
print(yaml_data)

Output:

products:
  - name: Smartphone
    price: 699
  - name: Laptop
    price: 999

Using lxml

You can use the lxml library for advanced XML parsing capabilities.

from lxml import etree
import yaml
xml_data = '''
<company>
    <employee>
        <name>Yasmine</name>
        <department>HR</department>
    </employee>
    <employee>
        <name>Karim</name>
        <department>Engineering</department>
    </employee>
</company>
'''
root = etree.fromstring(xml_data)
employees = []
for emp in root.findall('employee'):
    employee = {
        'name': emp.findtext('name'),
        'department': emp.findtext('department')
    }
    employees.append(employee)
yaml_data = yaml.dump({'employees': employees}, sort_keys=False)
print(yaml_data)

Output:

employees:
  - name: Yasmine
    department: HR
  - name: Karim
    department: Engineering

 

Custom Key Naming

You can rename the keys by iterating over them and modifying the key:

import xmltodict
import yaml
xml_data = '''
<inventory>
    <item id="201">
        <productName>Tablet</productName>
        <quantity>50</quantity>
    </item>
    <item id="202">
        <productName>Headphones</productName>
        <quantity>150</quantity>
    </item>
</inventory>
'''
data_dict = xmltodict.parse(xml_data)

# Rename keys
items = []
for item in data_dict['inventory']['item']:
    items.append({
        'product_id': item['@id'],
        'name': item['productName'],
        'stock': int(item['quantity'])
    })
yaml_data = yaml.dump({'inventory': items}, sort_keys=False)
print(yaml_data)

Output:

inventory:
  - product_id: '201'
    name: Tablet
    stock: 50
  - product_id: '202'
    name: Headphones
    stock: 150

 

Convert Specific Elements

You can convert specific elements from XML by specifying the elements during the iteration process:

import xmltodict
import yaml
xml_data = '''
<university>
    <student>
        <name>Salma</name>
        <major>Biology</major>
        <gpa>3.8</gpa>
    </student>
    <student>
        <name>Tarek</name>
        <major>Mathematics</major>
        <gpa>3.9</gpa>
    </student>
</university>
'''
data_dict = xmltodict.parse(xml_data)

# Extract only names and majors
students = []
for student in data_dict['university']['student']:
    students.append({
        'name': student['name'],
        'major': student['major']
    })
yaml_data = yaml.dump({'students': students}, sort_keys=False)
print(yaml_data)

Output:

students:
  - name: Salma
    major: Biology
  - name: Tarek
    major: Mathematics
Leave a Reply

Your email address will not be published. Required fields are marked *