How to Convert Excel to YAML in Python

Converting Excel data to YAML format is a common task when preparing configuration files or structured data for applications.

In this tutorial, you’ll learn how to convert Excel files to YAML using various Python libraries.

 

 

Using csv with PyYAML (Excel to CSV to YAML)

You can start by saving your Excel file as CSV format from Microsoft Excel and then use the csv and yaml libraries to convert it to YAML.

import csv
import yaml
with open('data.csv', 'r', encoding='utf-8') as csv_file:
    reader = csv.DictReader(csv_file)
    data_list = list(reader)
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

Assuming your data.csv looks like this:

Name,Age,City
Ahmed,30,Cairo
Fatima,25,Alexandria
Hassan,35,Giza

The generated data.yaml will be:

- Age: '30'
  City: Cairo
  Name: Ahmed
- Age: '25'
  City: Alexandria
  Name: Fatima
- Age: '35'
  City: Giza
  Name: Hassan

 

Using pandas with PyYAML

You can use pandas library to read Excel files directly and then convert the data to YAML.

import pandas as pd
import yaml
df = pd.read_excel('data.xlsx')
data_list = df.to_dict(orient='records')
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

If your data.xlsx contains:

| Name | Age | City |
|——–|—–|————-|
| Ahmed | 30 | Cairo |
| Fatima | 25 | Alexandria |
| Hassan | 35 | Giza |

The resulting data.yaml will be:

- Age: 30
  City: Cairo
  Name: Ahmed
- Age: 25
  City: Alexandria
  Name: Fatima
- Age: 35
  City: Giza
  Name: Hassan

This code reads the Excel file using pandas and writes the data to a YAML file.

 

Using openpyxl with PyYAML

To read Excel files without pandas, you can use openpyxl.

from openpyxl import load_workbook
import yaml
wb = load_workbook('data.xlsx')
sheet = wb.active
data_list = []
headers = [cell.value for cell in next(sheet.iter_rows(min_row=1, max_row=1))]
for row in sheet.iter_rows(min_row=2, values_only=True):
    data = dict(zip(headers, row))
    data_list.append(data)
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

The data.yaml file will contain:

- Age: 30
  City: Cairo
  Name: Ahmed
- Age: 25
  City: Alexandria
  Name: Fatima
- Age: 35
  City: Giza
  Name: Hassan

This code uses openpyxl to read the Excel file and converts it to YAML.

 

Using xlrd with PyYAML

Although xlrd no longer supports .xlsx files, you can still use it for .xls files.

import xlrd
import yaml
wb = xlrd.open_workbook('data.xls')
sheet = wb.sheet_by_index(0)
data_list = []
headers = sheet.row_values(0)
for row_idx in range(1, sheet.nrows):
    row_values = sheet.row_values(row_idx)
    data = dict(zip(headers, row_values))
    data_list.append(data)
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

The generated data.yaml will be:

- Age: 30.0
  City: Cairo
  Name: Ahmed
- Age: 25.0
  City: Alexandria
  Name: Fatima
- Age: 35.0
  City: Giza
  Name: Hassan

 

Handle Multiple Sheets in Excel

If your Excel file contains multiple sheets, you can process each sheet individually.

import pandas as pd
import yaml

# Read all sheets
xlsx = pd.ExcelFile('data.xlsx')
all_data = {}
for sheet_name in xlsx.sheet_names:
    df = pd.read_excel(xlsx, sheet_name=sheet_name)
    data_list = df.to_dict(orient='records')
    all_data[sheet_name] = data_list
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(all_data, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

The data.yaml file will have data organized by sheet names:

Sheet1:
  - Age: 30
    City: Cairo
    Name: Ahmed
  - Age: 25
    City: Alexandria
    Name: Fatima
Sheet2:
  - Age: 40
    City: Luxor
    Name: Ibrahim
  - Age: 28
    City: Aswan
    Name: Mona

This code reads all sheets from the Excel file and writes them to YAML with sheet names as keys.

 

Represent Data By Grouping

You can process hierarchical relationships in your Excel data to represent nested data.

import pandas as pd
import yaml
df = pd.read_excel('data.xlsx')
grouped = df.groupby('City')
data_dict = {}
for city, group in grouped:
    data_list = group[['Name', 'Age']].to_dict(orient='records')
    data_dict[city] = data_list
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_dict, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

Assuming data.xlsx contains:

| Name | Age | City |
|——–|—–|————-|
| Ahmed | 30 | Cairo |
| Fatima | 25 | Alexandria |
| Hassan | 35 | Cairo |
| Mona | 28 | Alexandria |

The resulting data.yaml will be:

Alexandria:
  - Age: 25
    Name: Fatima
  - Age: 28
    Name: Mona
Cairo:
  - Age: 30
    Name: Ahmed
  - Age: 35
    Name: Hassan

This code groups data by city and creates a nested YAML structure.

 

Filter Specific Columns Before Conversion

You can select specific columns from the Excel file before converting to YAML.

import pandas as pd
import yaml

# Read Excel file and select specific columns
df = pd.read_excel('data.xlsx', usecols=['Name', 'City'])
data_list = df.to_dict(orient='records')
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

The data.yaml will contain:

- City: Cairo
  Name: Ahmed
- City: Alexandria
  Name: Fatima
- City: Giza
  Name: Hassan

This code reads only the ‘Name’ and ‘City’ columns and writes them to YAML.

 

Convert Excel with Date Formats

To handle date formats during the conversion, you can parse and format dates appropriately.

import pandas as pd
import yaml
df = pd.read_excel('data.xlsx', parse_dates=['JoinDate'])

# Convert dates to string format
df['JoinDate'] = df['JoinDate'].dt.strftime('%Y-%m-%d')

# Convert DataFrame to list of dictionaries
data_list = df.to_dict(orient='records')
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
    yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)

Output:

Assuming data.xlsx contains:

| Name | JoinDate |
|——–|———–|
| Ahmed | 2020-05-15|
| Fatima | 2021-07-10|
| Hassan | 2019-09-20|

The generated data.yaml will be:

- JoinDate: '2020-05-15'
  Name: Ahmed
- JoinDate: '2021-07-10'
  Name: Fatima
- JoinDate: '2019-09-20'
  Name: Hassan

This code reads the ‘JoinDate’ column as dates, formats them, and writes the data to YAML.
Without converting the dates, you will get YAML like this:

- JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp
  - 1589500800000000000
  - null
  - null
  - 10
  Name: Ahmed
- JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp
  - 1625875200000000000
  - null
  - null
  - 10
  Name: Fatima
- JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp
  - 1568937600000000000
  - null
  - null
  - 10
  Name: Hassan
Leave a Reply

Your email address will not be published. Required fields are marked *