How to Convert Excel to YAML in Python

Converting Excel data to YAML format is a common task when preparing configuration files or structured data for applications.

In this tutorial, you’ll learn how to convert Excel files to YAML using various Python libraries.

Table of Contents hide

1 Using csv with PyYAML (Excel to CSV to YAML)
2 Using pandas with PyYAML
3 Using openpyxl with PyYAML
4 Using xlrd with PyYAML
5 Handle Multiple Sheets in Excel
6 Represent Data By Grouping
7 Filter Specific Columns Before Conversion
8 Convert Excel with Date Formats

Using csv with PyYAML (Excel to CSV to YAML)

You can start by saving your Excel file as CSV format from Microsoft Excel and then use the csv and yaml libraries to convert it to YAML.

import csv
import yaml
with open('data.csv', 'r', encoding='utf-8') as csv_file:
    reader = csv.DictReader(csv_file)
    data_list = list(reader)
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

Assuming your data.csv looks like this:

Name,Age,City
Ahmed,30,Cairo
Fatima,25,Alexandria
Hassan,35,Giza

The generated data.yaml will be:

- Age: '30'
  City: Cairo
  Name: Ahmed
- Age: '25'
  City: Alexandria
  Name: Fatima
- Age: '35'
  City: Giza
  Name: Hassan

Using pandas with PyYAML

You can use pandas library to read Excel files directly and then convert the data to YAML.

import pandas as pd
import yaml
df = pd.read_excel('data.xlsx')
data_list = df.to_dict(orient='records')
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

If your data.xlsx contains:

| Name | Age | City |
|——–|—–|————-|
| Ahmed | 30 | Cairo |
| Fatima | 25 | Alexandria |
| Hassan | 35 | Giza |

The resulting data.yaml will be:

- Age: 30
  City: Cairo
  Name: Ahmed
- Age: 25
  City: Alexandria
  Name: Fatima
- Age: 35
  City: Giza
  Name: Hassan

This code reads the Excel file using pandas and writes the data to a YAML file.

Using openpyxl with PyYAML

To read Excel files without pandas, you can use openpyxl.

from openpyxl import load_workbook
import yaml
wb = load_workbook('data.xlsx')
sheet = wb.active
data_list = []
headers = [cell.value for cell in next(sheet.iter_rows(min_row=1, max_row=1))]
for row in sheet.iter_rows(min_row=2, values_only=True):
    data = dict(zip(headers, row))
    data_list.append(data)
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

The data.yaml file will contain:

- Age: 30
  City: Cairo
  Name: Ahmed
- Age: 25
  City: Alexandria
  Name: Fatima
- Age: 35
  City: Giza
  Name: Hassan

This code uses openpyxl to read the Excel file and converts it to YAML.

Using xlrd with PyYAML

Although xlrd no longer supports .xlsx files, you can still use it for .xls files.

import xlrd
import yaml
wb = xlrd.open_workbook('data.xls')
sheet = wb.sheet_by_index(0)
data_list = []
headers = sheet.row_values(0)
for row_idx in range(1, sheet.nrows):
    row_values = sheet.row_values(row_idx)
    data = dict(zip(headers, row_values))
    data_list.append(data)
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

The generated data.yaml will be:

- Age: 30.0
  City: Cairo
  Name: Ahmed
- Age: 25.0
  City: Alexandria
  Name: Fatima
- Age: 35.0
  City: Giza
  Name: Hassan

Handle Multiple Sheets in Excel

If your Excel file contains multiple sheets, you can process each sheet individually.

import pandas as pd
import yaml

# Read all sheets
xlsx = pd.ExcelFile('data.xlsx')
all_data = {}
for sheet_name in xlsx.sheet_names:
    df = pd.read_excel(xlsx, sheet_name=sheet_name)
    data_list = df.to_dict(orient='records')
    all_data[sheet_name] = data_list
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(all_data, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

The data.yaml file will have data organized by sheet names:

Sheet1:
  - Age: 30
    City: Cairo
    Name: Ahmed
  - Age: 25
    City: Alexandria
    Name: Fatima
Sheet2:
  - Age: 40
    City: Luxor
    Name: Ibrahim
  - Age: 28
    City: Aswan
    Name: Mona

This code reads all sheets from the Excel file and writes them to YAML with sheet names as keys.

Represent Data By Grouping

You can process hierarchical relationships in your Excel data to represent nested data.

import pandas as pd
import yaml
df = pd.read_excel('data.xlsx')
grouped = df.groupby('City')
data_dict = {}
for city, group in grouped:
    data_list = group[['Name', 'Age']].to_dict(orient='records')
    data_dict[city] = data_list
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_dict, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

Assuming data.xlsx contains:

| Name | Age | City |
|——–|—–|————-|
| Ahmed | 30 | Cairo |
| Fatima | 25 | Alexandria |
| Hassan | 35 | Cairo |
| Mona | 28 | Alexandria |

The resulting data.yaml will be:

Alexandria:
  - Age: 25
    Name: Fatima
  - Age: 28
    Name: Mona
Cairo:
  - Age: 30
    Name: Ahmed
  - Age: 35
    Name: Hassan

This code groups data by city and creates a nested YAML structure.

Filter Specific Columns Before Conversion

You can select specific columns from the Excel file before converting to YAML.

import pandas as pd
import yaml

# Read Excel file and select specific columns
df = pd.read_excel('data.xlsx', usecols=['Name', 'City'])
data_list = df.to_dict(orient='records')
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

The data.yaml will contain:

- City: Cairo
  Name: Ahmed
- City: Alexandria
  Name: Fatima
- City: Giza
  Name: Hassan

This code reads only the ‘Name’ and ‘City’ columns and writes them to YAML.

Convert Excel with Date Formats

To handle date formats during the conversion, you can parse and format dates appropriately.

import pandas as pd
import yaml
df = pd.read_excel('data.xlsx', parse_dates=['JoinDate'])

# Convert dates to string format
df['JoinDate'] = df['JoinDate'].dt.strftime('%Y-%m-%d')

# Convert DataFrame to list of dictionaries
data_list = df.to_dict(orient='records')
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

Assuming data.xlsx contains:

| Name | JoinDate |
|——–|———–|
| Ahmed | 2020-05-15|
| Fatima | 2021-07-10|
| Hassan | 2019-09-20|

The generated data.yaml will be:

- JoinDate: '2020-05-15'
  Name: Ahmed
- JoinDate: '2021-07-10'
  Name: Fatima
- JoinDate: '2019-09-20'
  Name: Hassan

This code reads the ‘JoinDate’ column as dates, formats them, and writes the data to YAML.
Without converting the dates, you will get YAML like this:

- JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp
  - 1589500800000000000
  - null
  - null
  - 10
  Name: Ahmed
- JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp
  - 1625875200000000000
  - null
  - null
  - 10
  Name: Fatima
- JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp
  - 1568937600000000000
  - null
  - null
  - 10
  Name: Hassan

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Using csv with PyYAML (Excel to CSV to YAML)

Using pandas with PyYAML

Using openpyxl with PyYAML

Using xlrd with PyYAML

Handle Multiple Sheets in Excel

Represent Data By Grouping

Filter Specific Columns Before Conversion

Convert Excel with Date Formats

Related posts

Leave a Reply Cancel reply