How to Convert Excel to YAML in Python
Converting Excel data to YAML format is a common task when preparing configuration files or structured data for applications.
In this tutorial, you’ll learn how to convert Excel files to YAML using various Python libraries.
Using csv with PyYAML (Excel to CSV to YAML)
You can start by saving your Excel file as CSV format from Microsoft Excel and then use the csv
and yaml
libraries to convert it to YAML.
import csv import yaml with open('data.csv', 'r', encoding='utf-8') as csv_file: reader = csv.DictReader(csv_file) data_list = list(reader) with open('data.yaml', 'w', encoding='utf-8') as yaml_file: yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
Assuming your data.csv
looks like this:
Name,Age,City Ahmed,30,Cairo Fatima,25,Alexandria Hassan,35,Giza
The generated data.yaml
will be:
- Age: '30' City: Cairo Name: Ahmed - Age: '25' City: Alexandria Name: Fatima - Age: '35' City: Giza Name: Hassan
Using pandas with PyYAML
You can use pandas
library to read Excel files directly and then convert the data to YAML.
import pandas as pd import yaml df = pd.read_excel('data.xlsx') data_list = df.to_dict(orient='records') with open('data.yaml', 'w', encoding='utf-8') as yaml_file: yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
If your data.xlsx
contains:
| Name | Age | City |
|——–|—–|————-|
| Ahmed | 30 | Cairo |
| Fatima | 25 | Alexandria |
| Hassan | 35 | Giza |
The resulting data.yaml
will be:
- Age: 30 City: Cairo Name: Ahmed - Age: 25 City: Alexandria Name: Fatima - Age: 35 City: Giza Name: Hassan
This code reads the Excel file using pandas
and writes the data to a YAML file.
Using openpyxl with PyYAML
To read Excel files without pandas
, you can use openpyxl
.
from openpyxl import load_workbook import yaml wb = load_workbook('data.xlsx') sheet = wb.active data_list = [] headers = [cell.value for cell in next(sheet.iter_rows(min_row=1, max_row=1))] for row in sheet.iter_rows(min_row=2, values_only=True): data = dict(zip(headers, row)) data_list.append(data) with open('data.yaml', 'w', encoding='utf-8') as yaml_file: yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
The data.yaml
file will contain:
- Age: 30 City: Cairo Name: Ahmed - Age: 25 City: Alexandria Name: Fatima - Age: 35 City: Giza Name: Hassan
This code uses openpyxl
to read the Excel file and converts it to YAML.
Using xlrd with PyYAML
Although xlrd
no longer supports .xlsx
files, you can still use it for .xls
files.
import xlrd import yaml wb = xlrd.open_workbook('data.xls') sheet = wb.sheet_by_index(0) data_list = [] headers = sheet.row_values(0) for row_idx in range(1, sheet.nrows): row_values = sheet.row_values(row_idx) data = dict(zip(headers, row_values)) data_list.append(data) with open('data.yaml', 'w', encoding='utf-8') as yaml_file: yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
The generated data.yaml
will be:
- Age: 30.0 City: Cairo Name: Ahmed - Age: 25.0 City: Alexandria Name: Fatima - Age: 35.0 City: Giza Name: Hassan
Handle Multiple Sheets in Excel
If your Excel file contains multiple sheets, you can process each sheet individually.
import pandas as pd import yaml # Read all sheets xlsx = pd.ExcelFile('data.xlsx') all_data = {} for sheet_name in xlsx.sheet_names: df = pd.read_excel(xlsx, sheet_name=sheet_name) data_list = df.to_dict(orient='records') all_data[sheet_name] = data_list with open('data.yaml', 'w', encoding='utf-8') as yaml_file: yaml.dump(all_data, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
The data.yaml
file will have data organized by sheet names:
Sheet1: - Age: 30 City: Cairo Name: Ahmed - Age: 25 City: Alexandria Name: Fatima Sheet2: - Age: 40 City: Luxor Name: Ibrahim - Age: 28 City: Aswan Name: Mona
This code reads all sheets from the Excel file and writes them to YAML with sheet names as keys.
Represent Data By Grouping
You can process hierarchical relationships in your Excel data to represent nested data.
import pandas as pd import yaml df = pd.read_excel('data.xlsx') grouped = df.groupby('City') data_dict = {} for city, group in grouped: data_list = group[['Name', 'Age']].to_dict(orient='records') data_dict[city] = data_list with open('data.yaml', 'w', encoding='utf-8') as yaml_file: yaml.dump(data_dict, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
Assuming data.xlsx
contains:
| Name | Age | City |
|——–|—–|————-|
| Ahmed | 30 | Cairo |
| Fatima | 25 | Alexandria |
| Hassan | 35 | Cairo |
| Mona | 28 | Alexandria |
The resulting data.yaml
will be:
Alexandria: - Age: 25 Name: Fatima - Age: 28 Name: Mona Cairo: - Age: 30 Name: Ahmed - Age: 35 Name: Hassan
This code groups data by city and creates a nested YAML structure.
Filter Specific Columns Before Conversion
You can select specific columns from the Excel file before converting to YAML.
import pandas as pd import yaml # Read Excel file and select specific columns df = pd.read_excel('data.xlsx', usecols=['Name', 'City']) data_list = df.to_dict(orient='records') with open('data.yaml', 'w', encoding='utf-8') as yaml_file: yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
The data.yaml
will contain:
- City: Cairo Name: Ahmed - City: Alexandria Name: Fatima - City: Giza Name: Hassan
This code reads only the ‘Name’ and ‘City’ columns and writes them to YAML.
Convert Excel with Date Formats
To handle date formats during the conversion, you can parse and format dates appropriately.
import pandas as pd import yaml df = pd.read_excel('data.xlsx', parse_dates=['JoinDate']) # Convert dates to string format df['JoinDate'] = df['JoinDate'].dt.strftime('%Y-%m-%d') # Convert DataFrame to list of dictionaries data_list = df.to_dict(orient='records') with open('data.yaml', 'w', encoding='utf-8') as yaml_file: yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
Assuming data.xlsx
contains:
| Name | JoinDate |
|——–|———–|
| Ahmed | 2020-05-15|
| Fatima | 2021-07-10|
| Hassan | 2019-09-20|
The generated data.yaml
will be:
- JoinDate: '2020-05-15' Name: Ahmed - JoinDate: '2021-07-10' Name: Fatima - JoinDate: '2019-09-20' Name: Hassan
This code reads the ‘JoinDate’ column as dates, formats them, and writes the data to YAML.
Without converting the dates, you will get YAML like this:
- JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp - 1589500800000000000 - null - null - 10 Name: Ahmed - JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp - 1625875200000000000 - null - null - 10 Name: Fatima - JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp - 1568937600000000000 - null - null - 10 Name: Hassan
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.