Convert CSV to SQL using Python Pandas
In this tutorial, you’ll learn how to convert CSV data to SQL using Pandas in Python.
From basic conversions to more complex operations such as handling large datasets, specifying data types, appending data to existing tables, and customizing column names.
Basic CSV to SQL Conversion
The Pandas library is used to read the CSV data, and then it is written to an SQL table using the .to_sql()
method.
import pandas as pd from sqlalchemy import create_engine, text csv_data = """id,name,age 1,UserA,30 2,UserB,25 3,UserC,35""" with open("sample_data.csv", "w") as file: file.write(csv_data) engine = create_engine('sqlite:///my_database.db') df = pd.read_csv('sample_data.csv') df.to_sql('users', engine, index=False, if_exists='replace') with engine.connect() as connection: query = text("SELECT * FROM users") result = connection.execute(query).fetchall() print(result)
Output:
[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]
Specifying Data Types
In this example, specific data types are defined for the SQL table columns.
This ensures that each column in the SQL database has the appropriate data type, which is particularly useful for maintaining data integrity.
import pandas as pd from sqlalchemy import create_engine, text csv_data = """id,name,age 1,UserA,30 2,UserB,25 3,UserC,35""" with open("sample_data.csv", "w") as file: file.write(csv_data) engine = create_engine('sqlite:///my_database.db') df = pd.read_csv('sample_data.csv') from sqlalchemy import create_engine, Integer, String data_types = {'id': Integer(), 'name': String(), 'age': Integer()} df.to_sql('users', engine, index=False, if_exists='replace', dtype=data_types) with engine.connect() as connection: query = text("SELECT * FROM users") result = connection.execute(query).fetchall() print(result)
Output:
[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]
Handling Large Data with Chunksize
This example demonstrates how to handle large CSV files by reading and writing them in chunks.
chunksize = 1 for chunk in pd.read_csv('sample_data.csv', chunksize=chunksize): chunk.to_sql('users', engine, index=False, if_exists='append') with engine.connect() as connection: query = text("SELECT * FROM users") result = connection.execute(query).fetchall() print(result)
Output:
[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]
Adding Primary Key
This example illustrates how to add a primary key to the SQL table.
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String metadata = MetaData() engine = create_engine('sqlite:///my_database.db') table = Table('users', metadata, Column('id', Integer, primary_key=True), Column('name', String), Column('age', Integer)) metadata.create_all(engine) df = pd.read_csv('sample_data.csv') df.to_sql('users', engine, index=False, if_exists='append', method='multi') with engine.connect() as connection: query = text("SELECT * FROM users") result = connection.execute(query).fetchall() print(result)
Output:
[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]
Append CSV to Existing SQL
You can append CSV data to an existing SQL table rather than replacing it.
# First write to SQL to create the table df.to_sql('users', engine, index=False, if_exists='replace') # Data to append more_csv_data = """id,name,age 4,UserD,28 5,UserE,32""" with open("more_data.csv", "w") as file: file.write(more_csv_data) more_df = pd.read_csv('more_data.csv') # Append new data to existing table more_df.to_sql('users', engine, index=False, if_exists='append') with engine.connect() as connection: query = text("SELECT * FROM users") result = connection.execute(query).fetchall() print(result)
Output:
[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35), (4, 'UserD', 28), (5, 'UserE', 32)]
Adding Index as a Column
This example demonstrates how to include the DataFrame’s index as a column in the SQL table.
df.to_sql('users', engine, index=True, if_exists='replace') with engine.connect() as connection: query = text("SELECT * FROM users") result = connection.execute(query).fetchall() print(result)
Output:
[(0, 1, 'UserA', 30), (1, 2, 'UserB', 25), (2, 3, 'UserC', 35)]
Filtering Data Before Export
You can filter data before exporting it to an SQL table.
filtered_df = df[df['age'] > 30] filtered_df.to_sql('users', engine, index=False, if_exists='replace') with engine.connect() as connection: query = text("SELECT * FROM users") result = connection.execute(query).fetchall() print(result)
Output:
[(3, 'UserC', 35)]
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.