Python YAML Libraries (Best YAML Parsers)

YAML is a human-readable data serialization format commonly used for configuration files and data exchange between languages.

In Python, several libraries allow you to parse and work with YAML files.

This tutorial explores some of the best YAML parsers available for Python developers.

 

 

PyYAML

PyYAML provides native support for YAML and is easy to integrate into your projects.

To load a simple YAML file using PyYAML, you can use the following code:

import yaml
yaml_data = """
person:
  name: "Ahmed"
  age: 30
  city: "Cairo"
"""
data = yaml.safe_load(yaml_data)
print(data)

Output:

{'person': {'name': 'Ahmed', 'age': 30, 'city': 'Cairo'}}

To dump a Python dictionary back into a YAML formatted string:

import yaml
data = {
    'person': {
        'name': 'Sara',
        'age': 25,
        'city': 'Alexandria'
    }
}
yaml_string = yaml.dump(data)
print(yaml_string)

Output:

person:
  age: 25
  city: Alexandria
  name: Sara

 

ruamel.yaml

ruamel.yaml is a YAML 1.2 parser that supports round-trip preservation of comments, formatting, and ordering.

It’s useful when you need to read a YAML file, make changes, and write it back without losing the original structure.

To preserve the structure and comments of a YAML file using ruamel.yaml:

from ruamel.yaml import YAML
yaml_str = """
# User information
user:
  name: "Fatima"  # User's name
  age: 28
  city: "Giza"
"""
yaml = YAML()
data = yaml.load(yaml_str)
data['user']['age'] = 29
from io import StringIO
stream = StringIO()
yaml.dump(data, stream)
print(stream.getvalue())

Output:

# User information
user:
  name: Fatima        # User's name
  age: 29
  city: Giza

This code loads a YAML string with comments, updates the user’s age, and writes it back while preserving the comments and formatting.

 

yamlreader

yamlreader allows you to load multiple YAML files and combine them into a single dictionary.

This is handy for managing configurations split across several files.

To combine multiple YAML configurations:

from yamlreader import yaml_load
import os
yaml_data_1 = """
database:
  host: "localhost"
  port: 5432
"""

yaml_data_2 = """
database:
  user: "dbuser"
  password: "secret"
"""

with open('config1.yaml', 'w') as file:
    file.write(yaml_data_1)

with open('config2.yaml', 'w') as file:
    file.write(yaml_data_2)
yaml_dir = os.path.dirname(os.path.abspath(__file__))
yaml_files = ['config1.yaml', 'config2.yaml']
merged_yaml = yaml_load(yaml_files, yaml_dir)
print(merged_yaml)

Output:

{'database': {'host': 'localhost', 'port': 5432, 'user': 'dbuser', 'password': 'secret'}}

This code writes two separate YAML configurations to files and then loads and merges them into a single Python dictionary.

 

oyaml

oyaml is a drop-in replacement for PyYAML that preserves the order of dictionary keys.

This is useful when the order of elements matters in your YAML files.

To load and dump YAML while preserving key order:

import oyaml as yaml

yaml_data = """
person:
  name: "Mohamed"
  age: 40
  occupation: "Engineer"
"""

data = yaml.safe_load(yaml_data)
print(data)

yaml_string = yaml.dump(data)
print(yaml_string)

Output:

{'person': {'name': 'Mohamed', 'age': 40, 'occupation': 'Engineer'}}
person:
  name: Mohamed
  age: 40
  occupation: Engineer

 

StrictYAML

StrictYAML is a parser that enforces a stricter subset of YAML. It avoids some of YAML’s problematic features.

To parse YAML using StrictYAML:

from strictyaml import load, Map, Int, Str
yaml_data = """
employee:
  name: "Youssef"
  age: 35
  department: "Sales"
"""
schema = Map({
    "employee": Map({
        "name": Str(),
        "age": Int(),
        "department": Str(),
    })
})
data = load(yaml_data, schema)
print(data.data)

Output:

{'employee': {'name': 'Youssef', 'age': 35, 'department': 'Sales'}}

This code validates the YAML input against a schema to ensure the data types match the expectations.

 

Performance Benchmark

To compare parsing speeds, you can measure the time it takes to load data using the timeit module:

import time
import yaml
from ruamel.yaml import YAML
import oyaml
import strictyaml
yaml_data = """
name: Tom
age: 30
cities:
  - New York
  - London
  - Tokyo
is_employee: true
"""

def measure_parsing(parser_func, name):
    start_time = time.time()
    for _ in range(1000):  # Run 1000 times for more accurate measurement
        parser_func(yaml_data)
    end_time = time.time()
    print(f"{name}: {(end_time - start_time) * 1000:.2f} ms")

# 1. PyYAML
measure_parsing(yaml.safe_load, "PyYAML")

# 2. ruamel.yaml
ruamel_yaml = YAML(typ='safe')
measure_parsing(ruamel_yaml.load, "ruamel.yaml")

# 3. oyaml
measure_parsing(oyaml.safe_load, "oyaml")

# 4. StrictYAML
schema = strictyaml.Map({
    "name": strictyaml.Str(),
    "age": strictyaml.Int(),
    "cities": strictyaml.Seq(strictyaml.Str()),
    "is_employee": strictyaml.Bool()
})
measure_parsing(lambda x: strictyaml.load(x, schema), "StrictYAML")

Output:

PyYAML: 343.57 ms
ruamel.yaml: 181.23 ms
oyaml: 343.20 ms
StrictYAML: 1377.98 ms

As you can see, ruamel.yaml is the fastest yaml parser.

 

When to use each

Choosing the right YAML parser depends on your specific needs:

  • PyYAML: Use when you need a standard and straightforward parser for general purposes.
  • ruamel.yaml: Ideal when you need to preserve YAML file structure, comments, and formatting during round-trip load and dump operations.
  • yamlreader: Helpful when combining multiple YAML files into a single configuration.
  • oyaml: Use when the order of dictionary keys is important and must be preserved.
  • StrictYAML: Best when you need strict validation of YAML content against a schema to ensure data integrity.
Leave a Reply

Your email address will not be published. Required fields are marked *