How to Filter YAML Data in Python

This tutorial will guide you through various methods to filter YAML data in Python.

You’ll learn how to extract specific information, apply conditions, and manipulate YAML data structures using Python built-in features and libraries.

 

 

Basic Filtering

Filter by key

To filter YAML data by key, you can load the YAML content and access specific keys directly:

import yaml
yaml_data = """
employee:
  name: "Fatima"
  age: 28
  department: "Marketing"
"""
data = yaml.safe_load(yaml_data)

# Access the 'name' key
name = data['employee']['name']
print(name)

Output:

Fatima

Filter by value

You can filter data based on the value of a specific key:

import yaml
yaml_data = """
employees:
  - name: "Omar"
    age: 35
    department: "Finance"
  - name: "Layla"
    age: 29
    department: "Engineering"
  - name: "Hassan"
    age: 42
    department: "Finance"
"""
data = yaml.safe_load(yaml_data)

# Filter employees in the Finance department
finance_employees = [emp for emp in data['employees'] if emp['department'] == "Finance"]
print(finance_employees)

Output:

[{'name': 'Omar', 'age': 35, 'department': 'Finance'}, {'name': 'Hassan', 'age': 42, 'department': 'Finance'}]

 

Using list comprehensions

You can use list comprehensions for filtering YAML:

import yaml
yaml_data = """
products:
  - name: "Laptop"
    price: 1500
    in_stock: true
  - name: "Smartphone"
    price: 800
    in_stock: false
  - name: "Tablet"
    price: 600
    in_stock: true
"""
data = yaml.safe_load(yaml_data)

# Get names of products that are in stock
in_stock_products = [product['name'] for product in data['products'] if product['in_stock']]
print(in_stock_products)

Output:

['Laptop', 'Tablet']

 

Apply lambda functions

Lambda functions can be used to filter YAML data dynamically:

import yaml
yaml_data = """
students:
  - name: "Noura"
    grade: 85
  - name: "Karim"
    grade: 92
  - name: "Salma"
    grade: 78
"""
data = yaml.safe_load(yaml_data)

# Filter students with grades above 80
high_achievers = list(filter(lambda s: s['grade'] > 80, data['students']))
print(high_achievers)

Output:

[{'name': 'Noura', 'grade': 85}, {'name': 'Karim', 'grade': 92}]

This outputs a list of students who scored above 80 by applying a lambda function within the filter() function.

 

Regular Expression Filtering

Filter keys with regex

You can use regular expressions to match YAML keys:

import yaml
import re
yaml_data = """
measurements:
  temp_morning: 20
  temp_evening: 15
  humidity_morning: 80
  humidity_evening: 70
"""
data = yaml.safe_load(yaml_data)

# Filter keys that start with 'temp_'
temp_measurements = {k: v for k, v in data['measurements'].items() if re.match(r'^temp_', k)}
print(temp_measurements)

Output:

{'temp_morning': 20, 'temp_evening': 15}

This outputs a dictionary of measurements where keys start with ‘temp_’ by using a regular expression.

Filter values with regex

Regular expressions can also filter based on string values:

import yaml
import re
yaml_data = """
logs:
  - date: "2023-10-01"
    message: "Error: failed to load module"
  - date: "2023-10-02"
    message: "Warning: deprecated API usage"
  - date: "2023-10-03"
    message: "Error: null pointer exception"
"""
data = yaml.safe_load(yaml_data)

# Filter logs containing 'Error' in the message
error_logs = [log for log in data['logs'] if re.search(r'Error', log['message'])]
print(error_logs)

Output:

[{'date': '2023-10-01', 'message': 'Error: failed to load module'}, {'date': '2023-10-03', 'message': 'Error: null pointer exception'}]

This outputs a list of logs where the ‘message’ contains the word ‘Error’ by searching with a regular expression.

 

Filter based on multiple conditions

You can filter YAML based on multiple criteria:

import yaml
yaml_data = """
books:
  - title: "Python Basics"
    author: "Aisha"
    year: 2019
    available: true
  - title: "Advanced Python"
    author: "Hossam"
    year: 2021
    available: false
  - title: "Data Science with Python"
    author: "Aisha"
    year: 2020
    available: true
"""
data = yaml.safe_load(yaml_data)

# Filter books by author 'Aisha' that are available
available_books_by_aisha = [book for book in data['books'] if book['author'] == "Aisha" and book['available']]
print(available_books_by_aisha)

Output:

[{'title': 'Python Basics', 'author': 'Aisha', 'year': 2019, 'available': True}, {'title': 'Data Science with Python', 'author': 'Aisha', 'year': 2020, 'available': True}]

This outputs a list of books authored by ‘Aisha’ that are currently available, filtering based on both ‘author’ and ‘available’ fields.

Leave a Reply

Your email address will not be published. Required fields are marked *