ALTER TABLE on Large MySQL Tables Without Downtime: A Guide to pt-online-schema-change

MySQL tutorial - IT technology blog
MySQL tutorial - IT technology blog

The Nightmare of ALTER TABLE on Production

Managing MySQL systems with tables reaching hundreds of millions of records has never been easy. Even a simple request like “adding a new column” or “changing a data type” is enough to make DBAs break a sweat.

I remember when I first started, I once had to handle a 50GB logs table on MySQL 8.0. It was peak hour with about 500 concurrent users. I confidently typed the command: ALTER TABLE logs ADD COLUMN metadata <a href="https://itfromzero.com/en/mysql-en/mysql-json-data-type-store-flexible-data-without-changing-your-schema.html">JSON</a>;.

Just 2 seconds later, the monitor turned bright red. All queries were stuck in “Waiting for table metadata lock”, the website was spinning, and my boss was standing right behind me. It took 30 breathless minutes for the command to finish. That was a costly lesson in naivety when working with large databases.

Why Does MySQL Cause Downtime When Changing Table Structures?

The issue lies in the fact that MySQL’s Online DDL (from version 5.6 onwards) still has many loopholes. Despite being called “online”, it can still cause “clinical death” for the system.

Technically, when you change a structure, MySQL typically follows a process: create a temporary table, copy data, and swap. During this process, MySQL often places an Exclusive Lock on the main table. All write operations from the application are blocked and queued. With a 50GB table like mine, a copy process taking tens of minutes meant the application was completely “frozen” for that entire duration.

Common Solutions and Their Risks

Before finding “the one,” our team tried many manual methods:

1. Using MySQL Online DDL (ALGORITHM=INPLACE)

It sounds great in theory, but in reality, it still requires a Metadata lock at the beginning and end of the process. If the server has long-running queries, acquiring this lock causes severe congestion (queuing).

2. Maintenance at 2 AM

This is the “human-powered” approach. The entire team has to stay up late and disconnect the application to run the command. This is exhausting and impractical for global systems that need to run 24/7.

3. Manual Shadow Table

You create a new table, copy data in blocks of 10,000 rows, and then use RENAME TABLE. This is safer but very difficult to synchronize new data that arrives during the copy process.

pt-online-schema-change: A Powerful Tool for DBAs

After those “painful” experiences, I found pt-online-schema-change in the Percona Toolkit. This is considered the top choice for handling large tables without worrying about locks.

Smart Workflow

This tool does not interfere directly with the old table but uses a smarter mechanism:

  1. Create an empty “shadow” table with the new structure.
  2. Set up Triggers (INSERT, UPDATE, DELETE) to automatically sync all changes from the original table to the shadow table instantly.
  3. Divide old data into small chunks (e.g., 1,000 rows at a time) to copy gradually. This avoids I/O bottlenecks and prevents CPU spikes.
  4. Once the copy is finished, it uses RENAME TABLE to swap the two tables in a few milliseconds.
  5. Finally, it cleans up the old table and the triggers.

Installation and Real-world Usage

On Ubuntu, installation is very simple:

sudo apt-get update
sudo apt-get install percona-toolkit

Suppose you need to add a status column to the orders table. Don’t run the command directly; use the following set of commands:

# Step 1: Dry Run to check for errors
pt-online-schema-change --dry-run \
  --alter "ADD COLUMN status tinyint(1) DEFAULT 1" \
  h=localhost,D=my_database,t=orders,u=admin,p=password

# Step 2: Execute (Real run)
pt-online-schema-change --execute \
  --max-load Threads_running=50 \
  --alter "ADD COLUMN status tinyint(1) DEFAULT 1" \
  h=localhost,D=my_database,t=orders,u=admin,p=password

Important parameters to note:

  • --dry-run: Helps you check syntax and constraints without affecting data.
  • --max-load: Automatically pauses if the number of running threads exceeds 50, protecting the server from overload.
  • --alter: The structural changes you want to perform.

Battle-tested Experience to Remember

Based on real-world experience handling multi-GB tables, you should keep 3 key points in mind:

1. Beware of Triggers: Since the tool relies on triggers for synchronization, if your table already has triggers, check Percona’s documentation carefully. Older versions might conflict and cause data loss.

2. Storage Capacity: The shadow table will take up as much space as the original. If the table is 50GB, ensure the server has at least 60-70GB of free space so the copy process doesn’t fill up the disk.

3. Handling Foreign Keys: This is the most complex part. I recommend using the --alter-foreign-keys-method=auto parameter and always testing thoroughly in a Staging environment before touching Production.

Thanks to pt-online-schema-change, my database maintenance is now much smoother. I can add indexes or change data types in the middle of the day without users noticing any delay.

Share: