When MySQL “Runs Out of Breath” and the Nightmare Called Sharding
If you’ve ever managed MySQL systems with rapid growth, you’ve likely felt the “pain.” When tables reach hundreds of millions or billions of records, queries slow down and indexing becomes a nightmare. The traditional solution is Sharding (partitioning the database). However, manual Sharding is an obsession. It complicates application-level code and makes Joins or Transactions across multiple clusters extremely complex.
Pressure mounts when the boss asks for real-time reports on that very production data. You usually have two choices: run “risky” queries that bottleneck the system, or build a bulky ETL pipeline to push data to Data Warehouses like ClickHouse or BigQuery. Both are expensive to operate.
The core issue lies in the monolithic architecture of traditional RDBMS, which doesn’t scale horizontally well. This is where TiDB comes in to solve the problem thoroughly.
What is TiDB? Why Does It “Hit the Spot”?
TiDB is an open-source, distributed database with powerful HTAP (Hybrid Transactional/Analytical Processing) capabilities. Simply put: it handles transactions (OLTP) as well as MySQL, while performing lightning-fast big data analysis (OLAP) on the same system. Importantly, these two tasks operate separately without resource contention.
Working with TiDB feels very familiar. You still connect with MySQL Client and write standard SQL, but underneath is a system that can scale to dozens of nodes. If MySQL is an overloaded truck, TiDB is like a container train where you can add cars at any time.
Decoupled Architecture: The Secret to Flexibility
TiDB doesn’t lump everything together; it divides into specialized layers:
- TiDB Server (Computing Layer): This is a stateless layer that processes SQL. It receives requests, optimizes commands, and returns results. Being stateless, you just put them behind a Load Balancer; to increase computing power, just add more nodes.
- TiKV (Row-based Storage): Dedicated to OLTP. Data is split into Regions and automatically replicated using the Raft algorithm. If a node dies, the system recovers automatically without data loss.
- TiFlash (Column-based Storage): This is the key to HTAP. TiFlash stores data in columns, synchronized in real-time from TiKV. When you run analytical queries like
SUMorAVGon millions of rows, TiDB offloads the work to TiFlash for near-instant processing. - PD (Placement Driver): The “brain” coordinating the entire cluster, managing metadata, and routing transactions.
Quick TiDB Installation Guide with TiUP
To experience TiDB, tiup is the official package management tool you should use. It simplifies deployment from test to production environments.
1. Install TiUP
Run the following command on your Linux terminal (Ubuntu/CentOS):
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
Then, reload your environment variables to start using it:
source .bashrc
2. Launch Local TiDB Cluster (Playground)
Without needing complex servers, you can quickly set up a full cluster (TiDB, TiKV, TiFlash, PD) right on your personal machine for experimentation:
tiup playground
TiDB Server usually listens on port 4000. The console screen will display full connection information.
3. Connect and Operate
Use your familiar MySQL Client to access it:
mysql -h 127.0.0.1 -P 4000 -u root
Try creating sample data to check compatibility:
CREATE DATABASE demo_tidb;
USE demo_tidb;
CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(50), age INT);
INSERT INTO users VALUES (1, 'Hoang', 28), (2, 'Minh', 32);
Activating Analytical Power with TiFlash
The real power of TiDB lies in converting a standard table into columnar storage to accelerate reporting queries.
Enable Columnar Replica
Suppose your orders table has 500 million records. Just one command to sync to TiFlash:
ALTER TABLE orders SET TIFLASH REPLICA 1;
Check synchronization progress with:
SELECT * FROM information_schema.tiflash_replica WHERE table_name = 'orders';
Verify Performance
Perform a complex statistical query:
SELECT status, SUM(total_price) FROM orders GROUP BY status;
Use EXPLAIN to see how TiDB operates. If you see cop[tiflash], congratulations: the query is running on the distributed Columnar engine. Commands that used to take minutes on MySQL now take only seconds on TiDB.
A Few “Lessons Learned” When Migrating
Despite high compatibility, TiDB has unique characteristics you should note:
- Auto Increment: Auto-increment IDs in TiDB do not guarantee absolute continuity (1, 2, 3…) due to its distributed nature. Don’t use it if your business logic requires sequential IDs.
- Foreign Key: Since version 6.x, TiDB supports foreign keys. However, for best scalability, I still recommend handling constraints at the application logic layer.
- Full-text Search: TiDB is not yet a competitor for Elasticsearch or Solr. If you need deep Google-like search, you should still integrate an external search engine.
Conclusion
TiDB is not a “silver bullet” to replace MySQL in all situations, especially for small apps. But if you’re struggling with oversized databases, fearing Sharding, or need real-time reporting without complex ETL, TiDB is a top choice.
It only takes 15 minutes to try installing with tiup, and you’ll find managing terabytes of data much easier. Good luck exploring this HTAP ecosystem!

