Handle Duplicate Keys in YAML with Python Using ruamel.yaml
In this tutorial, you’ll learn how to use ruamel.yaml
to detect and prevent duplicate keys in YAML files.
You’ll explore various methods to identify duplicates, implement custom constructors, and more.
Detect Duplicate Keys When Parsing YAML
To parse a YAML string, you can use the ruamel.yaml
library:
from ruamel.yaml import YAML from ruamel.yaml.constructor import DuplicateKeyError yaml_str = """ person: name: Ali age: 30 age: 31 """ yaml = YAML() yaml.allow_duplicate_keys = False try: data = yaml.load(yaml_str) except DuplicateKeyError as e: print(e)
Output:
while constructing a mapping in "", line 3, column 3: name: Ali ^ (line: 3) found duplicate key "age" with value "31" (original value: "30") in "", line 5, column 3: age: 31 ^ (line: 5)
The parser raises a DuplicateKeyError
, because the duplicate age
keys.
Customize Duplicate Key Handling
You can override the construct_mapping
method to customize the duplicate key handling according to your needs.
from ruamel.yaml import YAML from ruamel.yaml.constructor import SafeConstructor, Constructor class UniqueKeyConstructor(SafeConstructor): def construct_mapping(self, node, deep=False): mapping = {} for key_node, value_node in node.value: key = self.construct_object(key_node, deep=deep) if key in mapping: print(f"Duplicate key detected: {key}") value = self.construct_object(value_node, deep=deep) mapping[key] = value return mapping class UniqueKeyYAML(YAML): def __init__(self, **kw): YAML.__init__(self, **kw) self.Constructor = UniqueKeyConstructor yaml_str = """ database: host: localhost port: 3306 port: 5432 """ yaml = UniqueKeyYAML(typ='safe') data = yaml.load(yaml_str) print(data)
Output:
Duplicate key detected: port {'database': {'host': 'localhost', 'port': 5432}}
By overriding construct_mapping
, you can log a message when a duplicate key is found instead of raising an error.
The last value for port
overwrites the previous one.
Keep All duplicate keys
To keep all values associated with duplicate keys during parsing, you can track them using a list.
from ruamel.yaml import YAML from ruamel.yaml.constructor import SafeConstructor from collections import defaultdict class MultiValueConstructor(SafeConstructor): def construct_mapping(self, node, deep=False): mapping = defaultdict(list) for key_node, value_node in node.value: key = self.construct_object(key_node, deep=deep) value = self.construct_object(value_node, deep=deep) mapping[key].append(value) return dict(mapping) class MultiValueYAML(YAML): def __init__(self, **kw): super().__init__(**kw) self.Constructor = MultiValueConstructor yaml_str = """ server: address: 192.168.1.1 address: 192.168.1.2 address: 192.168.1.3 """ yaml = MultiValueYAML(typ='safe') data = yaml.load(yaml_str) print(data)
Output:
{'server': {'address': ['192.168.1.1', '192.168.1.2', '192.168.1.3']}}
Now, all the address
values are stored in a list.
Handle Duplicate Keys in Nested Structures
To detect duplicates in nested structures, your custom constructor should recursively check for duplicates at each level.
from ruamel.yaml import YAML from ruamel.yaml.constructor import SafeConstructor, ConstructorError import ruamel.yaml.nodes class NestedUniqueKeyConstructor(SafeConstructor): def construct_mapping(self, node, deep=False): mapping = {} for key_node, value_node in node.value: key = self.construct_object(key_node, deep=deep) if key in mapping: raise ConstructorError( "while constructing a mapping", node.start_mark, f"found duplicate key ({key})", key_node.start_mark ) value = self.construct_object(value_node, deep=deep) if isinstance(value_node, ruamel.yaml.nodes.MappingNode): value = self.construct_mapping(value_node, deep=deep) mapping[key] = value return mapping class NestedUniqueKeyYAML(YAML): def __init__(self, **kw): super().__init__(**kw) self.Constructor = NestedUniqueKeyConstructor yaml_str = """ project: name: Sahara details: manager: Omar manager: Layla """ yaml = NestedUniqueKeyYAML(typ='safe') try: data = yaml.load(yaml_str) except ConstructorError as e: print(e)
Output:
while constructing a mapping in "", line 5, column 5 found duplicate key (manager) in "", line 6, column 5
This ensures that duplicates within nested dictionaries, like the manager
key are detected and reported.
Prevent Duplicate Keys When Writing YAML
You can set the allow_duplicate_keys
to False
to prevent duplicates from being written to the output file.
from ruamel.yaml import YAML yaml_str = """ settings: theme: light theme: dark """ yaml = YAML() yaml.allow_duplicate_keys = False try: data = yaml.load(yaml_str) data['settings']['language'] = 'en' with open('output.yaml', 'w') as f: yaml.dump(data, f) except Exception as e: print(e)
Output:
while constructing a mapping in "", line 3, column 3: theme: light ^ (line: 3) found duplicate key "theme" with value "dark" (original value: "light") in "", line 4, column 3: theme: dark ^ (line: 4)
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.