How to Validate YAML in Python

Validating YAML files ensures that your application processes data correctly and prevents runtime errors.

This tutorial shows you how to validate YAML in Python using various methods.

 

 

Syntax Validation

Detect Syntax Errors in YAML Files

Suppose you have the following YAML:

key1: value1
key2: value2
key3: value3
?: invalid_key
  nested_key: nested_value

To detect syntax errors, you can use the yaml.safe_load() function from the PyYAML library.

import yaml
with open('config.yaml', 'r') as file:
    try:
        data = yaml.safe_load(file)
        print("YAML syntax is valid.")
    except yaml.YAMLError as exc:
        print(f"Syntax error in YAML file: {exc}")

Output:

Syntax error in YAML file: mapping values are not allowed here
  in "config.yaml", line 5, column 13

The output indicates a syntax error in the YAML file.

Parsing Errors

Parsing errors occur when the YAML syntax is incorrect.

The yaml.safe_load() function detects parsing errors.

import yaml
yaml_string = """
name: Ahmed
age: 30
occupation: Software Developer
skills:
  - Python
  - JavaScript
  - SQL
education:
  degree: Bachelor's
  major: Computer Science
  university: Tech University
  - graduation_year: 2015
"""
try:
    data = yaml.safe_load(yaml_string)
    print("Parsed YAML data:", data)
except yaml.YAMLError as exc:
    print(f"Parsing error: {exc}")

Output:

Parsing error: while parsing a block mapping
  in "", line 10, column 3:
      degree: Bachelor's
      ^
expected , but found '-'
  in "", line 13, column 3:
      - graduation_year: 2015
      ^

The parsing error occurs because of the inconsistent indentation and structure in the education section.

The last line - graduation_year: 2015 is incorrectly indented and doesn’t follow the structure of the education mapping.

 

Schema-based YAML Validation

Using yamale

You can use the yamale library to validate YAML files against a schema.

import yamale
schema = yamale.make_schema(content="""
name: str()
age: int()
""")
data = yamale.make_data(content="""
name: "Ahmed"
age: "twenty-five"
""")
try:
    yamale.validate(schema, data)
    print("YAML data is valid.")
except yamale.YamaleError as exc:
    print("Validation failed!")
    print(exc)

Output:

Validation failed!
Error validating data
	age: 'twenty-five' is not a int.

The output shows that the age field has a string value when an integer is expected.

Using cerberus

The cerberus library provides flexible validation using schemas.

import yaml
from cerberus import Validator
schema = {
    'name': {'type': 'string'},
    'age': {'type': 'integer', 'min': 0},
}
yaml_string = """
name: "Mona"
age: -5
"""
data = yaml.safe_load(yaml_string)
validator = Validator(schema)

if validator.validate(data):
    print("YAML data is valid.")
else:
    print("Validation errors:", validator.errors)

Output:

Validation errors: {'age': ['min value is 0']}

The output shows that the age field violates the minimum value constraint.

 

Data Type Validation

To validate data types, define a schema that specifies the required types.

import yaml
from cerberus import Validator
schema = {
    'name': {'type': 'string'},
    'age': {'type': 'integer'},
    'skills': {'type': 'list'}
}
yaml_string = """
name: "Nadia"
age: 28
skills: "Python, Data Analysis"
"""
data = yaml.safe_load(yaml_string)
validator = Validator(schema)
if validator.validate(data):
    print("YAML data is valid.")
else:
    print("Validation errors:", validator.errors)

Output:

Validation errors: {'skills': ['must be of list type']}

The skills field is expected to be a list but is provided as a string.

Validate Nested Data Structures

You can validate nested structures by defining nested schemas.

import yaml
from cerberus import Validator
schema = {
    'project': {
        'type': 'dict',
        'schema': {
            'name': {'type': 'string'},
            'members': {
                'type': 'list',
                'schema': {'type': 'string'}
            }
        }
    }
}
yaml_string = """
project:
  name: "Al-Qahira Revamp"
  members:
    - "Omar"
    - "Layla"
    - 123
"""
data = yaml.safe_load(yaml_string)
validator = Validator(schema)
if validator.validate(data):
    print("Nested YAML data is valid.")
else:
    print("Validation errors:", validator.errors)

Output:

Validation errors: {'project': {'members': {2: ['must be of string type']}}}

The output shows that the third element in members is not a string as expected.

 

Implement Custom Data Type Validators

You can create custom validation rules by extending the validator.

import yaml
from cerberus import Validator
from cerberus import SchemaError
class CustomValidator(Validator):
    def __init__(self, *args, **kwargs):
        super(CustomValidator, self).__init__(*args, **kwargs)
        self.schema_registry.add('is_even', {'type': 'boolean'})
    def _validate_is_even(self, is_even, field, value):
        if is_even and value % 2 != 0:
            self._error(field, "Must be an even number")
schema = {
    'count': {'type': 'integer', 'is_even': True}
}
yaml_string = """
count: 7
"""
data = yaml.safe_load(yaml_string)
validator = CustomValidator(schema)
if validator.validate(data):
    print("YAML data is valid.")
else:
    print("Validation errors:", validator.errors)

Output:

Validation errors: {'count': ['Must be an even number']}

The custom validator checks that count is an even number.

 

Custom Validation Rules

You can add custom rules to existing validators.

import yaml
from cerberus import Validator
def check_length(field, value, error):
    if len(value) < 5:
        error(field, "Length must be at least 5 characters")
schema = {
    'username': {
        'type': 'string',
        'check_with': check_length
    }
}
yaml_string = """
username: "Ali"
"""
data = yaml.safe_load(yaml_string)
validator = Validator(schema)
if validator.validate(data):
    print("YAML data passes custom rules.")
else:
    print("Validation errors:", validator.errors)

Output:

Validation errors: {'username': ['Length must be at least 5 characters']}

The custom rule ensures the username is at least 5 characters long.

Leave a Reply

Your email address will not be published. Required fields are marked *