Python YAML Libraries (Best YAML Parsers)
YAML is a human-readable data serialization format commonly used for configuration files and data exchange between languages.
In Python, several libraries allow you to parse and work with YAML files.
This tutorial explores some of the best YAML parsers available for Python developers.
PyYAML
PyYAML provides native support for YAML and is easy to integrate into your projects.
To load a simple YAML file using PyYAML, you can use the following code:
import yaml yaml_data = """ person: name: "Ahmed" age: 30 city: "Cairo" """ data = yaml.safe_load(yaml_data) print(data)
Output:
{'person': {'name': 'Ahmed', 'age': 30, 'city': 'Cairo'}}
To dump a Python dictionary back into a YAML formatted string:
import yaml data = { 'person': { 'name': 'Sara', 'age': 25, 'city': 'Alexandria' } } yaml_string = yaml.dump(data) print(yaml_string)
Output:
person: age: 25 city: Alexandria name: Sara
ruamel.yaml
ruamel.yaml
is a YAML 1.2 parser that supports round-trip preservation of comments, formatting, and ordering.
It’s useful when you need to read a YAML file, make changes, and write it back without losing the original structure.
To preserve the structure and comments of a YAML file using ruamel.yaml
:
from ruamel.yaml import YAML yaml_str = """ # User information user: name: "Fatima" # User's name age: 28 city: "Giza" """ yaml = YAML() data = yaml.load(yaml_str) data['user']['age'] = 29 from io import StringIO stream = StringIO() yaml.dump(data, stream) print(stream.getvalue())
Output:
# User information user: name: Fatima # User's name age: 29 city: Giza
This code loads a YAML string with comments, updates the user’s age, and writes it back while preserving the comments and formatting.
yamlreader
yamlreader
allows you to load multiple YAML files and combine them into a single dictionary.
This is handy for managing configurations split across several files.
To combine multiple YAML configurations:
from yamlreader import yaml_load import os yaml_data_1 = """ database: host: "localhost" port: 5432 """ yaml_data_2 = """ database: user: "dbuser" password: "secret" """ with open('config1.yaml', 'w') as file: file.write(yaml_data_1) with open('config2.yaml', 'w') as file: file.write(yaml_data_2) yaml_dir = os.path.dirname(os.path.abspath(__file__)) yaml_files = ['config1.yaml', 'config2.yaml'] merged_yaml = yaml_load(yaml_files, yaml_dir) print(merged_yaml)
Output:
{'database': {'host': 'localhost', 'port': 5432, 'user': 'dbuser', 'password': 'secret'}}
This code writes two separate YAML configurations to files and then loads and merges them into a single Python dictionary.
oyaml
oyaml
is a drop-in replacement for PyYAML that preserves the order of dictionary keys.
This is useful when the order of elements matters in your YAML files.
To load and dump YAML while preserving key order:
import oyaml as yaml yaml_data = """ person: name: "Mohamed" age: 40 occupation: "Engineer" """ data = yaml.safe_load(yaml_data) print(data) yaml_string = yaml.dump(data) print(yaml_string)
Output:
{'person': {'name': 'Mohamed', 'age': 40, 'occupation': 'Engineer'}} person: name: Mohamed age: 40 occupation: Engineer
StrictYAML
StrictYAML
is a parser that enforces a stricter subset of YAML. It avoids some of YAML’s problematic features.
To parse YAML using StrictYAML
:
from strictyaml import load, Map, Int, Str yaml_data = """ employee: name: "Youssef" age: 35 department: "Sales" """ schema = Map({ "employee": Map({ "name": Str(), "age": Int(), "department": Str(), }) }) data = load(yaml_data, schema) print(data.data)
Output:
{'employee': {'name': 'Youssef', 'age': 35, 'department': 'Sales'}}
This code validates the YAML input against a schema to ensure the data types match the expectations.
Performance Benchmark
To compare parsing speeds, you can measure the time it takes to load data using the timeit
module:
import time import yaml from ruamel.yaml import YAML import oyaml import strictyaml yaml_data = """ name: Tom age: 30 cities: - New York - London - Tokyo is_employee: true """ def measure_parsing(parser_func, name): start_time = time.time() for _ in range(1000): # Run 1000 times for more accurate measurement parser_func(yaml_data) end_time = time.time() print(f"{name}: {(end_time - start_time) * 1000:.2f} ms") # 1. PyYAML measure_parsing(yaml.safe_load, "PyYAML") # 2. ruamel.yaml ruamel_yaml = YAML(typ='safe') measure_parsing(ruamel_yaml.load, "ruamel.yaml") # 3. oyaml measure_parsing(oyaml.safe_load, "oyaml") # 4. StrictYAML schema = strictyaml.Map({ "name": strictyaml.Str(), "age": strictyaml.Int(), "cities": strictyaml.Seq(strictyaml.Str()), "is_employee": strictyaml.Bool() }) measure_parsing(lambda x: strictyaml.load(x, schema), "StrictYAML")
Output:
PyYAML: 343.57 ms ruamel.yaml: 181.23 ms oyaml: 343.20 ms StrictYAML: 1377.98 ms
As you can see, ruamel.yaml is the fastest yaml parser.
When to use each
Choosing the right YAML parser depends on your specific needs:
- PyYAML: Use when you need a standard and straightforward parser for general purposes.
- ruamel.yaml: Ideal when you need to preserve YAML file structure, comments, and formatting during round-trip load and dump operations.
- yamlreader: Helpful when combining multiple YAML files into a single configuration.
- oyaml: Use when the order of dictionary keys is important and must be preserved.
- StrictYAML: Best when you need strict validation of YAML content against a schema to ensure data integrity.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.