When SQL “Surrenders” to Data Storms
Our team once operated a user behavior tracking system using PostgreSQL. Everything was fine until traffic hit 50,000 logs per second. At this point, the database started to “struggle for breath.” Disk I/O spiked to over 90% because SQL’s B-Tree mechanism had to constantly re-index as new data flooded in.
After struggling with MongoDB but still facing distributed computing challenges, I decided to switch to Apache Cassandra. It’s the top choice for large-scale systems, used by Facebook to handle Inbox search and later as the backbone for Netflix and Apple’s infrastructure.
If you’re working on IoT, logging, or messaging that requires extremely fast write speeds and the ability to scale out simply by adding servers, Cassandra is the answer.
Cassandra Architecture: Why Is It So Fast at Writing?
Forget the Master-Slave mindset of MySQL or Redis Sentinel. Cassandra operates on a Peer-to-Peer model where every node has the same role. This design completely eliminates the “Single Point of Failure.” A node goes down? The remaining nodes carry the load normally without waiting for a new Master election.
Data Write Mechanism: The Secret Lies in LSM-Tree
Many are surprised that Cassandra writes data significantly faster than traditional databases. This is thanks to the LSM-Tree structure. The write process works like this:
- Data is immediately written to the CommitLog on disk to prevent data loss.
- Simultaneously, it is saved to MemTable in RAM.
- When the MemTable is full, Cassandra flushes everything to disk as an SSTable (Sorted String Table) file.
The key point is that writing to disk happens sequentially. It’s like writing to the last page of a notebook instead of flipping through pages to find and edit (random seek). Consequently, write speeds nearly hit hardware limits.
Data Model Design: Thinking Opposite to SQL
The most common mistake when starting with Cassandra is bringing the SQL Normalization mindset over. In Cassandra, we don’t design tables based on entities. We design based on queries (Query-driven modeling).
Cassandra doesn’t support JOIN. Therefore, you must embrace Denormalization. Don’t be afraid of data duplication. Storage is cheap these days; what we prioritize is ultra-fast retrieval speeds.
Partition Key and Clustering Key
A Primary Key consists of two critical components:
- Partition Key: Determines which node the data resides on. If you choose a key with low distribution like “gender,” data will cluster on one node (hotspot), causing local overload.
- Clustering Key: Determines how data is sorted within the node. It is a powerful assistant for range queries.
Real-world experience: Choose a Partition Key with high cardinality like user_id or sensor_id instead of generic fields like status.
Quick Setup with Docker
The fastest way to experiment is to set up a 2-node cluster using Docker Compose.
version: '3.9'
services:
cassandra-node1:
image: cassandra:latest
container_name: cassandra1
ports:
- "9042:9042"
environment:
- CASSANDRA_CLUSTER_NAME=MyCluster
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
cassandra-node2:
image: cassandra:latest
container_name: cassandra2
environment:
- CASSANDRA_CLUSTER_NAME=MyCluster
- CASSANDRA_SEEDS=cassandra-node1
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
depends_on:
- cassandra-node1
Just type docker-compose up -d and check with the command docker exec -it cassandra1 nodetool status. When the UN (Up/Normal) status appears, the system is ready.
Practicing Queries with CQL
Cassandra uses CQL (Cassandra Query Language). The syntax is quite similar to SQL but is limited to ensure performance. Here is how to create a table to store temperature logs for an IoT device:
-- Create Keyspace (equivalent to Database)
CREATE KEYSPACE sensor_data
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2};
USE sensor_data;
-- Create table: device_id is the Partition Key, captured_at is the Clustering Key
CREATE TABLE temperature_logs (
device_id uuid,
captured_at timestamp,
value double,
PRIMARY KEY (device_id, captured_at)
) WITH CLUSTERING ORDER BY (captured_at DESC);
-- Insert test data
INSERT INTO temperature_logs (device_id, captured_at, value)
VALUES (uuid(), toTimestamp(now()), 25.5);
In this example, data from the same device will be stored next to each other on disk and sorted by the latest time, making historical queries extremely efficient.
Optimization Tips from Real-world Experience
After “paying the price” with several system crashes, I’ve learned 4 important lessons:
- Say no to
ALLOW FILTERING: If you have to use this in production, your Data Model is wrong. It forces Cassandra to scan the entire cluster, causing severe performance degradation. - Control Partition Size: A partition should not exceed 100MB. Overly large partitions make moving data between nodes (Rebalancing) a nightmare.
- Don’t over-use Batch: Unlike SQL, Batch in Cassandra is used to ensure atomicity across multiple tables. It doesn’t speed up writes and can even slow them down if records are scattered across many nodes.
- Configure Java Heap properly: Allocate about 1/4 to 1/2 of RAM for the JVM Heap (but not exceeding 32GB). This helps avoid long Garbage Collection pauses.
Conclusion
Cassandra is not a silver bullet. It is extremely powerful for write-heavy workloads but will be a disaster if you try to perform complex statistical reporting (SUM/AVG) or use queries requiring constant JOIN operations. By mastering Query-driven modeling, managing billions of records will no longer be a difficult problem. Good luck!

