Quick Start: Configure Histograms in 5 Minutes
Have you ever encountered a situation where a SQL query runs fast (0.5s) one moment and slow (10s) the next, despite having an index? The MySQL Optimizer might be “guessing” due to a lack of data density information. Try creating a Histogram to give it a comprehensive overview without the database overhead of an Index.
For example, with an orders table containing millions of rows, you want MySQL to understand the distribution of the order_status column:
-- Create a Histogram for the order_status column with 100 buckets
ANALYZE TABLE orders UPDATE HISTOGRAM ON order_status WITH 100 BUCKETS;
-- Check the results in the Data Dictionary
SELECT * FROM information_schema.column_statistics
WHERE table_name = 'orders' AND column_name = 'order_status';
With just one command, you’ve created a data “density map.” This allows the Optimizer to know exactly when to use an Index and when to perform a Full Table Scan for optimal speed.
Why are Indexes Sometimes “Powerless”?
In an E-commerce project I worked on, the products table had 15 million records. The category_id column was indexed. However, when querying popular categories (accounting for 40% of the data), the system suddenly hung because MySQL insisted on using the Index instead of a table scan, causing millions of random I/O lookups.
The reason is simple: An Index only tells MySQL where the data is, not how it is distributed.
MySQL Histograms (since version 8.0) solve this problem completely. They store frequency statistics of values as lightweight snapshots. Unlike Indexes, which must be updated constantly during INSERT/UPDATE operations, Histograms only consume resources when you manually run the ANALYZE command.
Quick Comparison: Index vs. Histogram
- Index: Automatically updated, good for precise row lookups but slows down write speed (INSERT/UPDATE).
- Histogram: Manually updated, doesn’t support specific row searches but helps the Optimizer estimate the number of rows (Selectivity) with high accuracy. It is the perfect choice for columns with skewed data.
Two Types of Histograms You Need to Know
MySQL automatically chooses the appropriate Histogram type based on your data column’s characteristics:
1. Singleton Histogram
This type is for columns with low cardinality (few distinct values). For example, gender with 3 values or order_status with 5 states. Each “bucket” represents a specific value along with its exact percentage.
2. Equi-Height Histogram
If a column has a wide range of values like price or created_at, MySQL uses Equi-Height. Data is divided into groups so that the number of rows in each group is approximately equal. This helps efficiently handle queries like WHERE price BETWEEN 100 AND 500.
Managing Histograms Like a Pro
Don’t create Histograms for every column indiscriminately. Focus on columns frequently used in filter conditions that have high data skew.
Fine-tuning Precision
The default number of buckets is 100. If your data is extremely complex (such as geographic coordinates or system error codes), increase this number up to a maximum of 1024 for better accuracy:
ANALYZE TABLE users UPDATE HISTOGRAM ON age WITH 256 BUCKETS;
Cleaning Up When No Longer Needed
If the data structure changes completely, an outdated Histogram can lead the Optimizer to make poor decisions. In such cases, remove it:
ANALYZE TABLE users DROP HISTOGRAM ON age;
Real-world Experience from Large Projects
After years of optimizing large database systems, I’ve identified 3 golden rules for using Histograms:
- Prioritize Skewed Data: If 80% of customers are concentrated in HCM City and the remaining 20% are spread across 62 provinces, Histograms help MySQL avoid applying the same query strategy for every province.
- Use for Low-Volatility Columns: Since they don’t auto-update, Histograms are ideal for product categories or historical data. Avoid using them for columns that change thousands of times per second.
- Replace Indexes for Reporting: For end-of-day BI/Reporting queries, instead of creating 10 indexes that bloat the
.ibdfile, I use Histograms. Query efficiency is comparable, but the speed of daily order INSERTs remains unaffected.
I once handled a log table with 100 million rows. Indexing the error_code column increased the index file size by 4GB. After removing the index and replacing it with a Histogram with 512 buckets, error statistics queries ran 5 times faster, while the data file became significantly lighter.
Technical Notes
Histograms currently only support basic data types like Number, String, and Date. They do not work with JSON or Spatial types. Don’t forget to set up a weekly Cron job to run UPDATE HISTOGRAM, ensuring that statistics stay aligned with the actual data in the table.
Understanding and applying Histograms effectively will help you control the MySQL Optimizer more efficiently, rather than just relying on indiscriminate indexing.

