Slow Database? The Culprit Might Be Your Data Types
When I first started working with MySQL, all I cared about was whether queries ran and whether indexes were correct. Choosing between INT and BIGINT, or VARCHAR(255) versus VARCHAR(50) — I just went with “bigger is safer” and moved on.
It wasn’t until my production MySQL 8.0 database hit around 50GB and started showing abnormally high I/O that I sat down to audit the schema — and discovered a string of problems. The user_id column was using BIGINT even though the data had never exceeded 2 million records. The status column stored the strings "active"/"inactive" instead of TINYINT. The email column was declared as VARCHAR(1000) “just to be safe.”
After refactoring the schema, queries became noticeably faster and the buffer pool utilization improved dramatically. This article documents the practical tips from that process — not textbook theory.
Why Do Data Types Affect I/O and RAM?
MySQL reads data in units of pages (InnoDB defaults to 16KB). How many rows fit in a page depends on the size of each row. The smaller the row, the more rows fit per page.
The practical consequences of “bloated” rows from oversized data types:
- Higher I/O: More disk reads are needed to scan the same amount of data because fewer rows fit per page
- Wasted RAM: The InnoDB buffer pool (memory cache) holds fewer rows for the same amount of RAM
- Larger indexes: B-tree index traversal slows down with larger key sizes — and indexes store data in the column’s declared type
A concrete example: for a table with 10 million rows, saving 8 bytes per row by choosing INT instead of BIGINT adds up to 80MB of data savings. Factor in the indexes that store that primary key, and the real number is much higher.
In Practice: Choosing the Right Data Type for Each Case
1. Integers — Don’t Default to BIGINT for Everything
This is the most common mistake I see. Each integer type has a different range and storage size:
-- Storage size and range:
TINYINT -- 1 byte | max 127 (signed) / 255 (unsigned)
SMALLINT -- 2 bytes | max 32,767 / 65,535
MEDIUMINT -- 3 bytes | max 8,388,607 / 16,777,215
INT -- 4 bytes | max 2,147,483,647 / 4,294,967,295
BIGINT -- 8 bytes | max ~9.2 × 10^18
Practical applications:
age,star_rating,quantityfor small values → useTINYINT UNSIGNED(0–255), no need for INTyear,postal_code→SMALLINT UNSIGNEDor theYEARtypeuser_id,product_id→ if data stays under 4 billion,INT UNSIGNEDis enough, saving 4 bytes/row over BIGINT- Only use
BIGINTwhen truly needed: financial transaction IDs, Snowflake IDs, distributed systems
-- Bad: BIGINT for user_id in a small system, wastes 4 bytes/row
CREATE TABLE orders (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
user_id BIGINT NOT NULL,
status TINYINT NOT NULL DEFAULT 1
);
-- Better: saves 8 bytes/row (id + user_id)
CREATE TABLE orders (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
user_id INT UNSIGNED NOT NULL,
status TINYINT UNSIGNED NOT NULL DEFAULT 1
);
2. Strings — Size VARCHAR(n) to Reality, Not “Just to Be Safe”
VARCHAR(n) stores only the actual length plus 1–2 bytes of overhead. However, the value of n affects temporary memory when MySQL needs to sort or group: it allocates n × charset_bytes per row in a temporary table, even if the actual data is much shorter.
-- Declare close to actual max length:
email VARCHAR(254), -- RFC 5321: max email length is 254 characters
username VARCHAR(50), -- usernames rarely exceed 50 characters
phone VARCHAR(20), -- longest international phone number is ~15 characters
slug VARCHAR(200), -- realistic URL slug length
country CHAR(2), -- ISO country code: VN, JP, US — use CHAR for fixed-length
-- Avoid oversized declarations:
-- email VARCHAR(1000) -- allocates 3000 bytes/row when sorting (UTF-8 × 3)
-- name VARCHAR(255) -- if the actual max is 100, don't declare 255
CHAR vs VARCHAR: Use CHAR(n) for fixed-length data like country codes, postal codes, or string-format UUIDs. Use VARCHAR for variable-length data.
TEXT types: Not stored inline in the row — InnoDB stores them on a separate page, with the row holding only a pointer. This means reading a TEXT column requires an additional I/O operation. Use only when you genuinely need to store long content:
-- Correct: TEXT for genuinely long content
article_content MEDIUMTEXT, -- blog article body
product_desc TEXT,
raw_log LONGTEXT
-- Wrong: TEXT for short data
note TEXT -- if notes are only a few dozen characters
-- Better:
note VARCHAR(500)
3. ENUM for Fixed Value Sets
Columns like status, role, and type with a handful of fixed values — this is where ENUM shines:
-- VARCHAR stores the actual string; each value takes exactly len(value) bytes
status VARCHAR(20) -- 'active' = 6 bytes, 'inactive' = 8 bytes
-- ENUM stores a 1-2 byte index; actual values live in table metadata
status ENUM('active', 'inactive', 'banned', 'pending') -- always 1 byte
role ENUM('admin', 'editor', 'viewer', 'guest')
One downside to keep in mind: adding or removing ENUM values requires an ALTER TABLE (expensive on large tables). If the value set changes frequently, using TINYINT UNSIGNED with constants in application code is more flexible.
4. Date and Time Types — TIMESTAMP Saves 4 Bytes/Row Compared to DATETIME
-- DATETIME : 8 bytes, range 1000–9999, not timezone-aware
-- TIMESTAMP: 4 bytes, range 1970–2038, auto-converts to server timezone
-- DATE : 3 bytes, no time component
-- In practice:
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
birth_date DATE, -- 3 bytes, no time needed
event_start DATETIME(3) -- when millisecond precision is needed, or range exceeds 2038
For systems running in a single timezone that don’t need to store timestamps beyond 2038, each TIMESTAMP column instead of DATETIME saves 4 bytes. A table with 10 million rows and 2 timestamp columns saves 80MB right away.
5. NULL — Declare NOT NULL Whenever Possible
Nullable columns cost an extra bit in the NULL bitmap. The impact isn’t as large as in MyISAM, but NULL complicates queries and can cause the optimizer to make poor decisions when estimating cardinality.
-- Default to NOT NULL + DEFAULT for columns that always have a value
view_count INT UNSIGNED NOT NULL DEFAULT 0,
is_active TINYINT(1) NOT NULL DEFAULT 1,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Only allow NULL when you need to distinguish "not set" from "empty value"
deleted_at TIMESTAMP NULL DEFAULT NULL, -- soft delete pattern
verified_at TIMESTAMP NULL DEFAULT NULL -- not yet verified = NULL
6. Auditing Your Current Schema with INFORMATION_SCHEMA
Before optimizing, I always run this query to spot “bloated” columns:
-- List columns with data types worth reviewing
SELECT
TABLE_NAME,
COLUMN_NAME,
COLUMN_TYPE,
CHARACTER_MAXIMUM_LENGTH,
IS_NULLABLE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'your_database'
AND (
DATA_TYPE IN ('text', 'mediumtext', 'longtext')
OR (DATA_TYPE = 'varchar' AND CHARACTER_MAXIMUM_LENGTH > 500)
OR (DATA_TYPE = 'bigint')
OR (DATA_TYPE = 'datetime')
)
ORDER BY TABLE_NAME, ORDINAL_POSITION;
-- Check the actual size of each table
SELECT
TABLE_NAME,
TABLE_ROWS,
ROUND(DATA_LENGTH / 1024 / 1024, 2) AS data_mb,
ROUND(INDEX_LENGTH / 1024 / 1024, 2) AS index_mb,
ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 2) AS total_mb
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC
LIMIT 20;
Conclusion: Principles I Apply From Day One
Schema design isn’t a “set and forget” thing. As data grows, poor early decisions turn into technical debt that’s expensive to pay back. Refactoring a 100-million-row table costs far more than getting it right from the start.
Here are the principles I now apply whenever I create a table:
- Use the smallest integer that fits: Estimate the real-world max value and choose the appropriate type. Add
UNSIGNEDif negative values aren’t needed. - Size VARCHAR(n) to reality: Don’t declare 255 or 1000 “just to be safe” — it affects memory during sorting.
- ENUM for stable fixed value sets: status, role, type — don’t use VARCHAR.
- TIMESTAMP over DATETIME when data won’t exceed 2038 — saves 4 bytes/row.
- NOT NULL + DEFAULT as the default: Only allow NULL when there’s a clear reason.
- Audit regularly with INFORMATION_SCHEMA: Catch issues early before data grows.
These optimizations might seem minor individually, but on a 50GB database like mine, they make a real difference — both in query speed and buffer pool RAM utilization.

