How to Convert YAML data To SQL in Python

In this tutorial, you’ll learn how to convert YAML data to SQL using Python.

You’ll learn how to handle various YAML structures, from simple key-value pairs to complex nested data.

Table of Contents hide

1 Basic YAML to SQL Insert
2 Handle Nested YAML
3 Process YAML Anchors and Aliases
4 Insert YAML Sequence Data

Basic YAML to SQL Insert

To start, read a simple YAML file and insert its data into a SQL table.

import yaml
import sqlite3
yaml_data = """
employees:
  - name: Ahmed
    age: 30
    department: Engineering
  - name: Fatima
    age: 28
    department: Marketing
"""
data = yaml.safe_load(yaml_data)
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
cursor.execute('''
    CREATE TABLE employees (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT,
        age INTEGER,
        department TEXT
    )
''')
for emp in data['employees']:
    cursor.execute('''
        INSERT INTO employees (name, age, department)
        VALUES (?, ?, ?)
    ''', (emp['name'], emp['age'], emp['department']))
cursor.execute('SELECT * FROM employees')
rows = cursor.fetchall()
print(rows)

Output:

[(1, 'Ahmed', 30, 'Engineering'), (2, 'Fatima', 28, 'Marketing')]

This code reads the YAML data containing a list of employees, creates an in-memory SQLite database, defines an employees table, and inserts each employee into the table.

Finally, you fetch and print all rows from the table.

Handle Nested YAML

To handle nested YAML data and convert it into multiple related SQL tables, parse the nested structures and maintain parent-child relationships.

import yaml
import sqlite3
yaml_data = """
departments:
  - name: Engineering
    employees:
      - name: Ahmed
        age: 30
      - name: Mona
        age: 25
  - name: Marketing
    employees:
      - name: Fatima
        age: 28
      - name: Karim
        age: 32
"""
data = yaml.safe_load(yaml_data)
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()
cursor.execute('''
    CREATE TABLE departments (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT
    )
''')
cursor.execute('''
    CREATE TABLE employees (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT,
        age INTEGER,
        department_id INTEGER,
        FOREIGN KEY(department_id) REFERENCES departments(id)
    )
''')
for dept in data['departments']:
    # Insert department
    cursor.execute('INSERT INTO departments (name) VALUES (?)', (dept['name'],))
    department_id = cursor.lastrowid

    # Insert employees for this department
    for emp in dept['employees']:
        cursor.execute('''
            INSERT INTO employees (name, age, department_id)
            VALUES (?, ?, ?)
        ''', (emp['name'], emp['age'], department_id))

cursor.execute('''
    SELECT employees.name, employees.age, departments.name
    FROM employees JOIN departments ON employees.department_id = departments.id
''')
rows = cursor.fetchall()
for row in rows:
    print(row)

Output:

('Ahmed', 30, 'Engineering')
('Mona', 25, 'Engineering')
('Fatima', 28, 'Marketing')
('Karim', 32, 'Marketing')

After parsing the nested YAML data, we insert each department into the departments table and each employee into the employees table.

We add the foreign key relationship via department_id.

By joining the two tables, we display each employee along with their department name.

Process YAML Anchors and Aliases

To handle YAML anchors and aliases and avoid duplication in data, process them in Python to correctly insert referenced data into SQL.

import yaml
import sqlite3
yaml_data = """
default_employee: &default_employee
  age: 30
  department: 'Engineering'

employees:
  - name: Ahmed
    <<: *default_employee
  - name: Fatima
    age: 28
    department: 'Marketing'
"""
data = yaml.safe_load(yaml_data)

# Connect to SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

# Create table
cursor.execute('''
    CREATE TABLE employees (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT,
        age INTEGER,
        department TEXT
    )
''')

# Insert data into table
for emp in data['employees']:
    cursor.execute('''
        INSERT INTO employees (name, age, department)
        VALUES (?, ?, ?)
    ''', (emp['name'], emp.get('age'), emp.get('department')))

cursor.execute('SELECT * FROM employees')
rows = cursor.fetchall()
print(rows)

Output:

[(1, 'Ahmed', 30, 'Engineering'), (2, 'Fatima', 28, 'Marketing')]

In this code, the YAML anchor &default_employee defines default properties.

The alias *default_employee inherits these properties in the first employee entry.

The yaml.safe_load method processes the anchors and aliases and yields a complete dictionary for each employee.

Then we insert the employees into the SQL table.

Insert YAML Sequence Data

You can insert YAML sequences directly into array fields if you’re working with a database like PostgreSQL that supports array data types.

import yaml
import psycopg2
from psycopg2.extras import execute_values
yaml_data = """
employees:
  - name: Ahmed
    skills:
      - Python
      - SQL
      - Git
  - name: Fatima
    skills:
      - Marketing
      - SEO
      - Content Writing
"""
data = yaml.safe_load(yaml_data)
conn = psycopg2.connect(
    host="localhost",
    database="mydb",
    user="root",
    password="123"
)
cursor = conn.cursor()

# Create table with array type
cursor.execute('''
    CREATE TABLE IF NOT EXISTS employees (
        id SERIAL PRIMARY KEY,
        name TEXT,
        skills TEXT[]
    )
''')
conn.commit()

# Insert data into table
employee_list = [
    (emp['name'], emp['skills'])
    for emp in data['employees']
]
query = '''
    INSERT INTO employees (name, skills)
    VALUES %s
'''
execute_values(cursor, query, employee_list)
conn.commit()
cursor.execute('SELECT * FROM employees')
rows = cursor.fetchall()
for row in rows:
    print(row)

Output:

(1, 'Ahmed', ['Python', 'SQL', 'Git'])
(2, 'Fatima', ['Marketing', 'SEO', 'Content Writing'])

This code connects to a PostgreSQL database and creates an employees table with a skills column of type TEXT[], which is an array of text.

Then we load the YAML data, extract each employee’s name and list of skills, and insert them into the table using execute_values.

Finally, we retrieve and print the data.

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Basic YAML to SQL Insert

Handle Nested YAML

Process YAML Anchors and Aliases

Insert YAML Sequence Data

Related posts

Leave a Reply Cancel reply