A Battle-Tested Lesson: When Cassandra Struggles Under High Traffic
Back when I worked on an AdTech project, our team faced an extremely stressful challenge. Real-time user logs and events surged to over 200,000 requests per second. Cassandra was the top choice then due to its horizontal scalability. However, reality hit hard when traffic peaked.
The dashboard was glowing red. Cassandra nodes would occasionally “freeze” for a few seconds, causing constant timeout errors in the application. Strangely, CPU usage never exceeded 50%, but P99 latency spiked into the seconds. If you’ve ever stayed up all night trying to save a database cluster on “life support,” you know this feeling of helplessness.
The Root Cause: The Troublemaker Named Garbage Collection
After deep debugging, I realized the bottleneck was the JVM (Java Virtual Machine) platform itself. Cassandra runs on Java, meaning it has to live with Garbage Collection (GC). For write-heavy systems, the JVM has to clear memory at a high frequency.
These “Stop-the-world” GC pauses freeze all read/write operations. The bitter reality is that we often waste about 30% of server power just running Java’s memory management tasks. Configuring a 32GB Heap size or switching to ZGC are only temporary fixes. The shadow of the JVM still prevents us from fully exploiting hardware performance.
Three Common Solutions (And Why They Usually Fail)
Our team tried everything before finding the perfect solution:
- Tuning GC: Spent a whole week tweaking parameters with negligible results. It’s extremely complex and prone to side effects.
- Hardware Upgrades: Renting servers with more RAM or using the best NVMe drives. This is essentially “throwing money at the problem”—expensive, yet performance per CPU core remains stagnant.
- Using Managed Services: Switching to AWS DynamoDB. It’s convenient, but the end-of-month bill almost gave my boss a heart attack because write costs were so exorbitant.
ScyllaDB: The Power of Shard-per-core Architecture
ScyllaDB emerged as a perfect upgrade: it maintains the Cassandra API but is completely rewritten in C++. The biggest selling point is the shard-per-core architecture. Instead of threads competing for resources, ScyllaDB assigns each CPU core to manage a specific portion of the data. No contention. Lockless. And most importantly, no more GC.
Quickly Deploy ScyllaDB with Docker
If you want to experiment, Docker is the fastest way to get a ScyllaDB cluster up in 30 seconds. I usually set up a small lab like this for benchmarking before deciding to migrate.
# Run a single ScyllaDB instance
docker run --name scylla-node -d scylladb/scylla
# Check node status (wait about 10 seconds for the node to be ready)
docker exec -it scylla-node nodetool status
Once you see the UN (Up/Normal) status, you can access the database using the legendary cqlsh command.
docker exec -it scylla-node cqlsh
Optimizing Table Design with CQL
ScyllaDB syntax (CQL) is identical to SQL, but you need to think in NoSQL terms to achieve sub-1ms speeds.
-- Create storage space (Keyspace)
CREATE KEYSPACE itfromzero
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE itfromzero;
-- Log table with structure optimized for the latest queries
CREATE TABLE user_logs (
user_id uuid,
action_time timestamp,
action_name text,
PRIMARY KEY (user_id, action_time)
) WITH CLUSTERING ORDER BY (action_time DESC);
Pro tip: user_id (Partition Key) helps ScyllaDB know which core holds the data. Designing this key to distribute data evenly will help the system handle extreme loads.
Real-world Experience Handling Big Data
When benchmarking with millions of CSV rows, I never do it manually. A small trick is using the conversion tool at toolcraft.app/en/tools/data/csv-to-json to quickly normalize sample file formats. Then, use a Python script to push data at lightning speed.
import uuid
from datetime import datetime
from cassandra.cluster import Cluster
# Connect to local node
cluster = Cluster(['127.0.0.1'])
session = cluster.connect('itfromzero')
# Insert a sample record
query = "INSERT INTO user_logs (user_id, action_time, action_name) VALUES (%s, %s, %s)"
session.execute(query, (uuid.uuid4(), datetime.now(), 'USER_LOGIN'))
Final Thoughts
ScyllaDB is truly a speed demon. If you’re struggling with Cassandra’s sluggishness or DynamoDB’s high costs, give ScyllaDB a try. The performance-to-dollar ratio is incredible. However, remember: it only truly shines on high-core-count CPUs and high-speed storage!

