Migrating Data from MySQL to PostgreSQL: Comparing Methods and Practical Guide

Database tutorial - IT technology blog
Database tutorial - IT technology blog

Switching from MySQL to PostgreSQL usually starts with a very specific reason: you need advanced JSON support, better full-text search, window functions, or simply your team is more familiar with PostgreSQL. Whatever the reason, the hardest part isn’t installing PostgreSQL — it’s moving all your existing data over without data loss or extended downtime.

I’ve done this with a ~3GB database, around 40 tables, tangled foreign key relationships, and a few stored procedures. This article documents what actually worked — not a smooth theoretical tutorial.

Why Is MySQL → PostgreSQL Migration So Error-Prone?

On the surface the two systems look similar, but underneath there are numerous differences that can catch you off guard:

  • Data types don’t map 1-to-1: TINYINT(1) in MySQL typically acts as a boolean, but PostgreSQL has its own BOOLEAN type. MySQL’s DATETIME maps to PostgreSQL’s TIMESTAMP — but timezone handling works completely differently.
  • AUTO_INCREMENT vs SERIAL/SEQUENCE: MySQL uses AUTO_INCREMENT, PostgreSQL uses SERIAL or GENERATED ALWAYS AS IDENTITY. Forgetting to reset sequences after migration means new inserts immediately throw duplicate key errors.
  • Case sensitivity: MySQL is case-insensitive for table names by default (on Windows/macOS), while PostgreSQL depends on the OS and collation — apps that use inconsistent table name casing will break.
  • Backtick vs double quote: MySQL uses backticks to escape table/column names, PostgreSQL uses double quotes. Pasting a MySQL SQL dump directly into PostgreSQL will fail immediately.
  • Strict mode by default: PostgreSQL refuses to insert a string into an integer column — it won’t auto-cast like MySQL does. This is the source of many silent data errors during migration.

Overlooking these differences means your migrated data may have type errors, lost NULL values, or sequences reset to 1 while your data already has IDs up to 50,000.

Comparing 3 Common Migration Approaches

Option 1: pgloader — Highest Level of Automation

pgloader is a dedicated database migration tool with native MySQL → PostgreSQL support. It automatically maps data types, handles encoding, and runs multiple parallel workers.

Pros:

  • Single command handles most of the conversion work automatically
  • Migrates schema + data simultaneously, no need to split into separate steps
  • Supports live migration — reads from a running MySQL instance without shutting it down

Cons:

  • Limited fine-grained control per table when you have special requirements
  • Stored procedures and triggers must be handled manually — pgloader doesn’t touch them
  • Installation on Ubuntu can be tricky; use Docker to be safe

Option 2: mysqldump + Manual SQL Conversion

Export schema and data from MySQL, run it through sed/awk or a Python script to convert the SQL, then import into PostgreSQL. A classic approach that’s still worth using in certain situations.

Pros:

  • Full control over every step — you know exactly what’s happening
  • Easier to debug: errors point to a specific line and file
  • Ideal for small databases (under 500MB) or when you need to change the schema significantly

Cons:

  • Time-consuming, easy to miss edge cases when writing the conversion script
  • Large databases produce heavy SQL dump files that are slow to process

Option 3: Python ETL Script

Write a Python script that connects to both databases simultaneously, reads from MySQL, and writes to PostgreSQL in batches. The most flexible option, but also the most work.

Pros:

  • Transform data freely before writing — change types, normalize values, merge tables
  • Handle custom business logic that no tool can do for you
  • Supports incremental runs — sync new data without re-migrating everything from scratch

Cons:

  • Requires writing code, more effort than the other two approaches
  • Inserting row by row is extremely slow — batch inserts are essential, otherwise a table with 1 million rows can take over an hour

Which Approach Should You Choose?

Rules of thumb I follow:

  • Pure data database with a clean schema → pgloader
  • Small database or heavy schema changes needed → mysqldump + convert
  • Data transformation required or incremental migration → Python ETL

For my ~3GB migration: I used pgloader for 37 standard tables, then wrote a separate Python ETL for 3 tables with complex logic. It took about 2 days total — most of that time was verifying and handling edge cases, not running tools.

Migration Guide with pgloader

Step 1: Install pgloader

# Ubuntu/Debian
sudo apt-get install pgloader

# Or via Docker — recommended to avoid dependency issues
docker pull dimitri/pgloader

Step 2: Create the pgloader configuration file

cat > migrate.load << 'EOF'
LOAD DATABASE
  FROM mysql://mysql_user:mysql_pass@localhost:3306/mydb
  INTO postgresql://pg_user:pg_pass@localhost:5432/mydb

WITH include drop, create tables,
     create indexes, reset sequences,
     workers = 4, concurrency = 2

SET work_mem to '128MB',
    maintenance_work_mem to '512MB'

CAST type tinyint to boolean
  using tinyint-to-boolean,
     type datetime to timestamptz
  using zero-dates-to-null;
EOF

Step 3: Run the migration

pgloader migrate.load

# Or via Docker
docker run --rm --network host \
  -v $(pwd)/migrate.load:/migrate.load \
  dimitri/pgloader pgloader /migrate.load

pgloader prints detailed logs per table: rows migrated, time taken, and any failed rows. The nice thing is it doesn’t stop when a row fails — it writes errors to a separate file for later review. For my ~3GB database, pgloader took about 8 minutes with 4 workers.

Manual Migration Guide with Python (for Complex Tables)

For tables with special logic — for example, a status column storing integers 0/1/2 that PostgreSQL needs as strings — I write a separate ETL:

import mysql.connector
import psycopg2
from psycopg2.extras import execute_batch

# Connect to both databases
mysql_conn = mysql.connector.connect(
    host='localhost', database='mydb',
    user='root', password='mysql_pass'
)
pg_conn = psycopg2.connect(
    host='localhost', dbname='mydb',
    user='pg_user', password='pg_pass'
)

mysql_cur = mysql_conn.cursor(dictionary=True)
pg_cur = pg_conn.cursor()

# Read from MySQL in batches
BATCH_SIZE = 1000
mysql_cur.execute("SELECT * FROM orders")

while True:
    rows = mysql_cur.fetchmany(BATCH_SIZE)
    if not rows:
        break

    # Transform: convert status from int to string
    transformed = []
    for row in rows:
        transformed.append((
            row['id'],
            row['user_id'],
            {0: 'pending', 1: 'completed', 2: 'cancelled'}.get(row['status'], 'pending'),
            row['created_at'],
            row['total_amount']
        ))

    execute_batch(
        pg_cur,
        """INSERT INTO orders (id, user_id, status, created_at, total_amount)
           VALUES (%s, %s, %s, %s, %s)
           ON CONFLICT (id) DO NOTHING""",
        transformed
    )
    pg_conn.commit()
    print(f"Migrated {len(transformed)} rows")

mysql_cur.close()
pg_cur.close()

Using psycopg2’s execute_batch is roughly 10–50x faster than inserting row by row depending on the dataset. For a 500k-row table, the difference between 40 minutes and 2 minutes comes down to this.

What to Verify After Migration

1. Check Row Counts

-- Run on both MySQL and PostgreSQL and compare results
SELECT table_name, table_rows
FROM information_schema.tables
WHERE table_schema = 'mydb';

2. Reset Sequences After Import

The most common mistake I see: after migration, sequences are still at 1, and inserting a new row throws a duplicate key error. Reset them to the current max ID:

-- Reset a specific table
SELECT setval('orders_id_seq', (SELECT MAX(id) FROM orders));

-- Script to automatically reset all tables
DO $$
DECLARE
    r RECORD;
BEGIN
    FOR r IN SELECT tablename FROM pg_tables WHERE schemaname = 'public' LOOP
        EXECUTE 'SELECT setval(pg_get_serial_sequence(''' || r.tablename || ''', ''id''), COALESCE(MAX(id), 1)) FROM ' || r.tablename;
    END LOOP;
END $$;

3. Check Foreign Key Constraints

-- Find foreign key violations
SELECT conname, conrelid::regclass
FROM pg_constraint
WHERE contype = 'f'
  AND NOT convalidated;

A Quick Tip for Preparing Your Data

Before migrating, I usually export some lookup tables to CSV for quick verification. When I need to convert CSV to JSON to compare with API responses, using the tool at toolcraft.app/en/tools/data/csv-to-json is faster than writing a one-off Python script — it runs in the browser so there’s no risk of exposing data.

Pre-Cutover Checklist for Production

  • Row counts match between MySQL and PostgreSQL
  • Sequences have been reset to the correct values (most commonly forgotten)
  • All indexes have been created — pgloader usually handles this, but verify anyway
  • Foreign key constraints are valid
  • Stored procedures/triggers have been rewritten in PostgreSQL syntax
  • Connection strings in the application have been updated
  • Run a selection of critical business queries against the real data

Stored procedures are the most labor-intensive part with no shortcut. MySQL uses DELIMITER with its own syntax, PostgreSQL uses PL/pgSQL — they’re completely different and must be rewritten by hand.

Database not too large, no complex stored procedures? pgloader handles 90% of it in a few minutes. The rest is thorough verification — especially sequences and indexes, since these are the two things most often overlooked and cause the latest-appearing bugs once you’re already in production.

Share: