Convert CSV to SQL using Python Pandas

In this tutorial, you’ll learn how to convert CSV data to SQL using Pandas in Python.

From basic conversions to more complex operations such as handling large datasets, specifying data types, appending data to existing tables, and customizing column names.

Table of Contents hide

1 Basic CSV to SQL Conversion
2 Specifying Data Types
3 Handling Large Data with Chunksize
4 Adding Primary Key
5 Append CSV to Existing SQL
6 Adding Index as a Column
7 Filtering Data Before Export

Basic CSV to SQL Conversion

The Pandas library is used to read the CSV data, and then it is written to an SQL table using the .to_sql() method.

import pandas as pd
from sqlalchemy import create_engine, text
csv_data = """id,name,age
1,UserA,30
2,UserB,25
3,UserC,35"""
with open("sample_data.csv", "w") as file:
    file.write(csv_data)
engine = create_engine('sqlite:///my_database.db')
df = pd.read_csv('sample_data.csv')
df.to_sql('users', engine, index=False, if_exists='replace')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]

Specifying Data Types

In this example, specific data types are defined for the SQL table columns.

This ensures that each column in the SQL database has the appropriate data type, which is particularly useful for maintaining data integrity.

import pandas as pd
from sqlalchemy import create_engine, text
csv_data = """id,name,age
1,UserA,30
2,UserB,25
3,UserC,35"""
with open("sample_data.csv", "w") as file:
    file.write(csv_data)
engine = create_engine('sqlite:///my_database.db')
df = pd.read_csv('sample_data.csv')
from sqlalchemy import create_engine, Integer, String
data_types = {'id': Integer(), 'name': String(), 'age': Integer()}
df.to_sql('users', engine, index=False, if_exists='replace', dtype=data_types)
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]

Handling Large Data with Chunksize

This example demonstrates how to handle large CSV files by reading and writing them in chunks.

chunksize = 1
for chunk in pd.read_csv('sample_data.csv', chunksize=chunksize):
    chunk.to_sql('users', engine, index=False, if_exists='append')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]

Adding Primary Key

This example illustrates how to add a primary key to the SQL table.

from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
metadata = MetaData()
engine = create_engine('sqlite:///my_database.db')
table = Table('users', metadata,
              Column('id', Integer, primary_key=True),
              Column('name', String),
              Column('age', Integer))
metadata.create_all(engine)
df = pd.read_csv('sample_data.csv')
df.to_sql('users', engine, index=False, if_exists='append', method='multi')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]

Append CSV to Existing SQL

You can append CSV data to an existing SQL table rather than replacing it.

# First write to SQL to create the table
df.to_sql('users', engine, index=False, if_exists='replace')

# Data to append
more_csv_data = """id,name,age
4,UserD,28
5,UserE,32"""
with open("more_data.csv", "w") as file:
    file.write(more_csv_data)
more_df = pd.read_csv('more_data.csv')

# Append new data to existing table
more_df.to_sql('users', engine, index=False, if_exists='append')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35), (4, 'UserD', 28), (5, 'UserE', 32)]

Adding Index as a Column

This example demonstrates how to include the DataFrame’s index as a column in the SQL table.

df.to_sql('users', engine, index=True, if_exists='replace')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(0, 1, 'UserA', 30), (1, 2, 'UserB', 25), (2, 3, 'UserC', 35)]

Filtering Data Before Export

You can filter data before exporting it to an SQL table.

filtered_df = df[df['age'] > 30]
filtered_df.to_sql('users', engine, index=False, if_exists='replace')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(3, 'UserC', 35)]

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Basic CSV to SQL Conversion

Specifying Data Types

Handling Large Data with Chunksize

Adding Primary Key

Append CSV to Existing SQL

Adding Index as a Column

Filtering Data Before Export

Related posts

Leave a Reply Cancel reply