Convert CSV to SQL using Python Pandas

In this tutorial, you’ll learn how to convert CSV data to SQL using Pandas in Python.

From basic conversions to more complex operations such as handling large datasets, specifying data types, appending data to existing tables, and customizing column names.

 

 

Basic CSV to SQL Conversion

The Pandas library is used to read the CSV data, and then it is written to an SQL table using the .to_sql() method.

import pandas as pd
from sqlalchemy import create_engine, text
csv_data = """id,name,age
1,UserA,30
2,UserB,25
3,UserC,35"""
with open("sample_data.csv", "w") as file:
    file.write(csv_data)
engine = create_engine('sqlite:///my_database.db')
df = pd.read_csv('sample_data.csv')
df.to_sql('users', engine, index=False, if_exists='replace')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]

 

Specifying Data Types

In this example, specific data types are defined for the SQL table columns.

This ensures that each column in the SQL database has the appropriate data type, which is particularly useful for maintaining data integrity.

import pandas as pd
from sqlalchemy import create_engine, text
csv_data = """id,name,age
1,UserA,30
2,UserB,25
3,UserC,35"""
with open("sample_data.csv", "w") as file:
    file.write(csv_data)
engine = create_engine('sqlite:///my_database.db')
df = pd.read_csv('sample_data.csv')
from sqlalchemy import create_engine, Integer, String
data_types = {'id': Integer(), 'name': String(), 'age': Integer()}
df.to_sql('users', engine, index=False, if_exists='replace', dtype=data_types)
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]

 

Handling Large Data with Chunksize

This example demonstrates how to handle large CSV files by reading and writing them in chunks.

chunksize = 1
for chunk in pd.read_csv('sample_data.csv', chunksize=chunksize):
    chunk.to_sql('users', engine, index=False, if_exists='append')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]

 

Adding Primary Key

This example illustrates how to add a primary key to the SQL table.

from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String
metadata = MetaData()
engine = create_engine('sqlite:///my_database.db')
table = Table('users', metadata,
              Column('id', Integer, primary_key=True),
              Column('name', String),
              Column('age', Integer))
metadata.create_all(engine)
df = pd.read_csv('sample_data.csv')
df.to_sql('users', engine, index=False, if_exists='append', method='multi')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35)]

 

Append CSV to Existing SQL

You can append CSV data to an existing SQL table rather than replacing it.

# First write to SQL to create the table
df.to_sql('users', engine, index=False, if_exists='replace')

# Data to append
more_csv_data = """id,name,age
4,UserD,28
5,UserE,32"""
with open("more_data.csv", "w") as file:
    file.write(more_csv_data)
more_df = pd.read_csv('more_data.csv')

# Append new data to existing table
more_df.to_sql('users', engine, index=False, if_exists='append')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(1, 'UserA', 30), (2, 'UserB', 25), (3, 'UserC', 35), (4, 'UserD', 28), (5, 'UserE', 32)]

 

Adding Index as a Column

This example demonstrates how to include the DataFrame’s index as a column in the SQL table.

df.to_sql('users', engine, index=True, if_exists='replace')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(0, 1, 'UserA', 30), (1, 2, 'UserB', 25), (2, 3, 'UserC', 35)]

 

Filtering Data Before Export

You can filter data before exporting it to an SQL table.

filtered_df = df[df['age'] > 30]
filtered_df.to_sql('users', engine, index=False, if_exists='replace')
with engine.connect() as connection:
    query = text("SELECT * FROM users")
    result = connection.execute(query).fetchall()
    print(result)

Output:

[(3, 'UserC', 35)]
Leave a Reply

Your email address will not be published. Required fields are marked *