Handle Duplicate Keys in YAML with Python Using ruamel.yaml

In this tutorial, you’ll learn how to use ruamel.yaml to detect and prevent duplicate keys in YAML files.

You’ll explore various methods to identify duplicates, implement custom constructors, and more.

 

 

Detect Duplicate Keys When Parsing YAML

To parse a YAML string, you can use the ruamel.yaml library:

from ruamel.yaml import YAML
from ruamel.yaml.constructor import DuplicateKeyError
yaml_str = """
person:
  name: Ali
  age: 30
  age: 31
"""
yaml = YAML()
yaml.allow_duplicate_keys = False
try:
    data = yaml.load(yaml_str)
except DuplicateKeyError as e:
    print(e)

Output:

while constructing a mapping
  in "", line 3, column 3:
      name: Ali
      ^ (line: 3)
found duplicate key "age" with value "31" (original value: "30")
  in "", line 5, column 3:
      age: 31
      ^ (line: 5)

The parser raises a DuplicateKeyError, because the duplicate age keys.

 

Customize Duplicate Key Handling

You can override the construct_mapping method to customize the duplicate key handling according to your needs.

from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor, Constructor
class UniqueKeyConstructor(SafeConstructor):
    def construct_mapping(self, node, deep=False):
        mapping = {}
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            if key in mapping:
                print(f"Duplicate key detected: {key}")
            value = self.construct_object(value_node, deep=deep)
            mapping[key] = value
        return mapping

class UniqueKeyYAML(YAML):
    def __init__(self, **kw):
        YAML.__init__(self, **kw)
        self.Constructor = UniqueKeyConstructor
yaml_str = """
database:
  host: localhost
  port: 3306
  port: 5432
"""
yaml = UniqueKeyYAML(typ='safe')
data = yaml.load(yaml_str)
print(data)

Output:

Duplicate key detected: port
{'database': {'host': 'localhost', 'port': 5432}}

By overriding construct_mapping, you can log a message when a duplicate key is found instead of raising an error.

The last value for port overwrites the previous one.

 

Keep All duplicate keys

To keep all values associated with duplicate keys during parsing, you can track them using a list.

from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor
from collections import defaultdict
class MultiValueConstructor(SafeConstructor):
    def construct_mapping(self, node, deep=False):
        mapping = defaultdict(list)
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            value = self.construct_object(value_node, deep=deep)
            mapping[key].append(value)
        return dict(mapping)
class MultiValueYAML(YAML):
    def __init__(self, **kw):
        super().__init__(**kw)
        self.Constructor = MultiValueConstructor
yaml_str = """
server:
  address: 192.168.1.1
  address: 192.168.1.2
  address: 192.168.1.3
"""
yaml = MultiValueYAML(typ='safe')
data = yaml.load(yaml_str)
print(data)

Output:

{'server': {'address': ['192.168.1.1', '192.168.1.2', '192.168.1.3']}}

Now, all the address values are stored in a list.

 

Handle Duplicate Keys in Nested Structures

To detect duplicates in nested structures, your custom constructor should recursively check for duplicates at each level.

from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor, ConstructorError
import ruamel.yaml.nodes
class NestedUniqueKeyConstructor(SafeConstructor):
    def construct_mapping(self, node, deep=False):
        mapping = {}
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            if key in mapping:
                raise ConstructorError(
                    "while constructing a mapping", node.start_mark,
                    f"found duplicate key ({key})", key_node.start_mark
                )
            value = self.construct_object(value_node, deep=deep)
            if isinstance(value_node, ruamel.yaml.nodes.MappingNode):
                value = self.construct_mapping(value_node, deep=deep)
            mapping[key] = value
        return mapping
class NestedUniqueKeyYAML(YAML):
    def __init__(self, **kw):
        super().__init__(**kw)
        self.Constructor = NestedUniqueKeyConstructor
yaml_str = """
project:
  name: Sahara
  details:
    manager: Omar
    manager: Layla
"""
yaml = NestedUniqueKeyYAML(typ='safe')
try:
    data = yaml.load(yaml_str)
except ConstructorError as e:
    print(e)

Output:

while constructing a mapping
  in "", line 5, column 5
found duplicate key (manager)
  in "", line 6, column 5

This ensures that duplicates within nested dictionaries, like the manager key are detected and reported.

 

Prevent Duplicate Keys When Writing YAML

You can set the allow_duplicate_keys to False to prevent duplicates from being written to the output file.

from ruamel.yaml import YAML
yaml_str = """
settings:
  theme: light
  theme: dark
"""
yaml = YAML()
yaml.allow_duplicate_keys = False
try:
    data = yaml.load(yaml_str)
    data['settings']['language'] = 'en'
    with open('output.yaml', 'w') as f:
        yaml.dump(data, f)
except Exception as e:
    print(e)

Output:

while constructing a mapping
  in "", line 3, column 3:
      theme: light
      ^ (line: 3)
found duplicate key "theme" with value "dark" (original value: "light")
  in "", line 4, column 3:
      theme: dark
      ^ (line: 4)
Leave a Reply

Your email address will not be published. Required fields are marked *