How to Import Large Datasets into MySQL Fast with LOAD DATA INFILE: Optimizing Throughput for Millions of Records – ITFROMZERO

Table of Contents

The Problem: Why Does Inserting Millions of Rows Take Forever?

Have you ever sat staring at a terminal running an INSERT command for hours with no end in sight? I once got a task to migrate around 50 million log rows from a legacy system into MySQL for reporting. My first attempt was a simple Python script that read a CSV file and executed INSERT INTO... one row at a time. Two hours later, it had barely crawled through about 3% of the data. My manager was about to pull the numbers for a meeting, and I was sweating bullets watching this snail-pace progress.

Why is it so slow? Every individual INSERT statement carries a pile of hidden overhead. MySQL has to parse the statement, check permissions, acquire locks, write to the transaction log (redo log), and most importantly, update all indexes after every single row. With a million rows, MySQL repeats all of that a million times — a cost you can clearly see when you analyze query performance with EXPLAIN. On top of that, network latency between the application and the database server adds up fast.

To solve this problem, MySQL provides a heavy-duty weapon: LOAD DATA INFILE. Instead of taking millions of tiny steps, it pushes data in large chunks directly into the storage engine, bypassing most of the unnecessary intermediate steps.

What Is LOAD DATA INFILE and Why Is It So Fast?

LOAD DATA INFILE is a SQL statement that reads data from a text file (typically CSV or TXT) and loads it directly into a MySQL table. According to the official documentation and my own hands-on experience, this method is anywhere from 10 to 20 times faster than regular INSERT statements.

Three main reasons it’s so much faster:

Reads data in large blocks, dramatically reducing the number of system calls.
Skips SQL parsing overhead for each individual row.
Can defer and batch index rebuilding until after all data is loaded, rather than updating indexes incrementally on every row.

Configuration Before You Start (Getting Past the Security Barriers)

Don’t rush to run the command just yet. MySQL blocks this feature by default to prevent SQL injection attacks that could read system files — you need to handle the following configuration steps first. If you haven’t locked down your MySQL instance yet, it’s worth reviewing essential MySQL server security steps before proceeding.

1. Check the secure_file_priv Variable

This variable controls which directory MySQL is allowed to read files from. Check it with:

SHOW VARIABLES LIKE 'secure_file_priv';

If the result is a path (e.g., /var/lib/mysql-files/), you must copy your CSV file into that exact directory.
If the result is NULL, the feature is completely disabled.
If the result is empty, you can read files from anywhere.

To change this, edit your my.cnf or my.ini file (for a full walkthrough of MySQL 8 configuration options, see the Install and Configure MySQL 8 on Ubuntu guide):

[mysqld]
secure_file_priv = ""

Then restart the MySQL server.

2. Enable LOAD DATA LOCAL

If the file lives on your local machine rather than on the MySQL server itself, you need the LOCAL keyword. Both the server and the client must allow it:

SET GLOBAL local_infile = 1;

Hands-On: Syntax and Real-World Examples

Say I have a file customers.csv with the structure: id, name, email, created_at. The file contents look like this:

1,"Nguyen Van A","[email protected]","2023-10-01 10:00:00"
2,"Tran Thi B","[email protected]","2023-10-01 10:05:00"

The SQL statement to import it looks like this:

LOAD DATA INFILE '/var/lib/mysql-files/customers.csv'
INTO TABLE customers
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id, name, email, created_at);

Breaking down the key parameters:

FIELDS TERMINATED BY: The delimiter between columns (usually a comma or tab \t).
ENCLOSED BY: The character wrapping string values — use double quotes to handle fields that contain commas inside them.
LINES TERMINATED BY: The line ending character. Linux uses \n, Windows uses \r\n — this one trips people up all the time.
IGNORE 1 LINES: Skip the first line if the file has a header row.

Pro Tips for Maximum Speed (Battle-Tested Techniques)

Running the command above will already give you decent performance. But for tens of millions of records, I typically apply these additional techniques to push the speed as high as possible.

1. Temporarily Disable Constraint Checks

Every time a row is inserted, MySQL checks whether the value violates a Foreign Key constraint or a Unique Index. When you’re importing pre-validated data, these checks are pure waste.

SET foreign_key_checks = 0;
SET unique_checks = 0;
SET autocommit = 0;

Once the import is done, remember to re-enable them and run COMMIT.

2. Disable Indexes (For MyISAM Tables or Empty Tables Only)

For MyISAM tables, you can use ALTER TABLE table_name DISABLE KEYS;. For InnoDB, the best approach is to sort your CSV file by Primary Key order before importing. This lets MySQL write data sequentially to disk, avoiding constant random I/O (disk thrashing).

3. Increase Buffer Sizes

Before running LOAD DATA, bump up the memory buffers to give MySQL more room to work with large datasets:

SET GLOBAL innodb_buffer_pool_size = 4G; -- Adjust based on your available RAM
SET GLOBAL innodb_log_file_size = 1G;
SET GLOBAL innodb_flush_log_at_trx_commit = 2;

Note: innodb_flush_log_at_trx_commit = 2 skips flushing the log to disk after every transaction, which gives a noticeable speed boost. The trade-off is that if the server loses power mid-import, you might lose a few seconds’ worth of data at the end — a completely acceptable risk for a one-time batch import.

Handling Common Errors

The three errors I hit most often when running this command — along with quick fixes:

Wrong date format error: MySQL is strict about expecting YYYY-MM-DD. If your CSV uses DD/MM/YYYY, use a temporary variable to convert it:

LOAD DATA INFILE 'data.csv'
INTO TABLE orders
FIELDS TERMINATED BY ','
(id, @temp_date, amount)
SET order_date = STR_TO_DATE(@temp_date, '%d/%m/%Y');

Data too long error: Usually caused by mismatched line endings — files created on Windows brought over to Linux almost always trigger this. Double-check \r\n versus \n.
Lost connection error: The file is too large and the connection times out. Split it into chunks of around 1–2GB each, or increase wait_timeout and max_allowed_packet in your MySQL configuration.

Conclusion

Back to the story from the beginning: after applying LOAD DATA INFILE combined with disabling foreign key checks and sorting the file by Primary Key, those 50 million rows finished loading in under 45 minutes. That saved over 20 hours compared to the original INSERT script, and my manager had the numbers ready in time for the meeting.

Before throwing 10 million rows at it, always test with 1,000 rows first — verify your TERMINATED BY characters and column mapping are correct before running the full load. And backing up beforehand goes without saying: 10 minutes to back up beats a full day of recovery every time. The last thing you want is to finish loading 10 million rows only to discover the columns are misaligned — cleaning that up is a nightmare.