Starting with a Mistake That Cost 3 Days
Two years ago, I had to migrate a 100GB database from MySQL to PostgreSQL — three days just for planning and one day for execution. Not because the system was complex, but because the team had picked the wrong tool from the start. MySQL was used for a project with complex JOIN requirements and strict transactions, yet the schema was designed in a NoSQL fashion — flexible, with no constraints. The result: slow queries, poor performance, and weeks of refactoring.
Sounds like a rookie mistake. But this is exactly the kind of architectural decision that many teams make in the first few minutes of a project — and then pay for over the following months.
The Real Problem: Why Choosing the Wrong Database Hurts
Imagine this scenario: you’re building an e-commerce app and choose MongoDB because “NoSQL is fast and flexible.” When you need revenue reports — JOINing orders with users, summing by category, filtering by date range — you end up writing an aggregation pipeline that spans an entire page. Half a day of debugging just to get a number that SQL would return in 3 lines.
The reverse is no better. Use MySQL for an IoT application that needs to write 50,000 events per second from thousands of devices: rigid schema, low write throughput — after a few weeks, the system starts to buckle.
The root cause in both cases: choosing a database based on “trends” or “familiarity,” not based on actual data characteristics and access patterns.
Breaking It Down: What Are SQL and NoSQL Actually Designed For?
SQL — Built to Guarantee Data Integrity
Relational databases — PostgreSQL, MySQL, SQLite, SQL Server — are built around one principle: data must never be in a contradictory state. To ensure that, they have four core properties:
- Fixed schema: Every row in a table has the same columns with the correct data types — no exceptions
- ACID transactions: Atomicity, Consistency, Isolation, Durability — data is never left in a “half-done” state
- Relations: Foreign keys, JOINs — tightly coupled, consistent data relationships
- Vertical scaling: Scale up by upgrading server hardware (RAM, CPU, SSD)
-- Typical SQL query: fetch orders with customer details
SELECT
o.id AS order_id,
u.name AS customer_name,
o.total_amount,
o.created_at
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'completed'
AND o.created_at >= '2025-01-01'
ORDER BY o.created_at DESC;
NoSQL — Built for Flexible Scaling
NoSQL isn’t a single technology — it’s a family of databases sharing one trait: they don’t use the relational model, favoring scale-out over rigid schemas. There are 4 main branches, each solving an entirely different problem:
- Document stores (MongoDB, CouchDB): Store JSON/BSON documents — great for semi-structured data
- Key-value stores (Redis, DynamoDB): Ultra-fast access by key — ideal for caching and sessions
- Column-family (Cassandra, HBase): Extremely high write throughput — great for time-series data and IoT
- Graph databases (Neo4j): Complex relationships between entities — perfect for social networks and recommendation engines
// MongoDB document — flexible user profile
{
"_id": "usr_001",
"name": "Nguyen Van A",
"email": "[email protected]",
"preferences": {
"theme": "dark",
"language": "en",
"notifications": ["email", "push"]
},
"social_accounts": [
{ "platform": "github", "username": "vana_dev" },
{ "platform": "linkedin", "url": "linkedin.com/in/vana" }
]
}
Flexible schema — no ALTER TABLE needed when adding new fields. User A has social_accounts, User B doesn’t — MongoDB doesn’t care. With PostgreSQL, you’d have to run a migration and handle NULL values for all existing rows.
The Right Approach: Matching the Tool to the Problem
When to Use SQL
These are the scenarios where choosing NoSQL will cause serious pain:
- Finance and accounting: Every transaction must guarantee ACID — you can’t afford to lose money “mid-way”
- E-commerce: Orders, inventory, payments — complex relationships requiring frequent JOINs
- ERP/CRM: Enterprise data has a stable schema that rarely changes over time
- Complex reporting:
GROUP BY,HAVING, subqueries, window functions
-- Window function: rank revenue by month
SELECT
product_name,
month,
revenue,
RANK() OVER (
PARTITION BY month
ORDER BY revenue DESC
) AS revenue_rank
FROM monthly_sales
WHERE year = 2025;
When to Use NoSQL
On the flip side, some problems are a poor fit for SQL:
- Real-time applications: Chat, gaming — Redis pub/sub or MongoDB with change streams
- Content management: Blogs, product catalogs — schema changes frequently based on business needs
- IoT / Time-series data: Millions of events per day — Cassandra or InfluxDB handle this far better
- Sessions and caching: Redis with TTL automatically expires sessions — no cleanup job needed
# Redis: store session with 30-minute TTL
redis-cli SET "session:usr_001" '{"user_id":1,"role":"admin"}' EX 1800
# Read session
redis-cli GET "session:usr_001"
# Check remaining time (seconds)
redis-cli TTL "session:usr_001"
# Python: write IoT data to Cassandra — extremely high write throughput
from cassandra.cluster import Cluster
from datetime import datetime
cluster = Cluster(['cassandra-node1', 'cassandra-node2'])
session = cluster.connect('iot_keyspace')
insert_query = """
INSERT INTO sensor_readings (device_id, timestamp, temperature, humidity)
VALUES (%s, %s, %s, %s)
"""
session.execute(insert_query, ('sensor_001', datetime.now(), 28.5, 65.2))
print("Write successful")
The Best Approach: Use Both for Their Strengths (Polyglot Persistence)
No production application complex enough truly needs just one type of database. Netflix uses Cassandra for viewing history, MySQL for billing, and Redis for sessions. Uber uses PostgreSQL for trip data and Redis for real-time matching. That’s Polyglot Persistence — each database does what it’s best at, instead of one doing everything.
Here’s an example architecture for a full-featured e-commerce application:
- PostgreSQL: Orders, users, inventory — requires strict ACID and JOINs
- Redis: Sessions, shopping carts, rate limiting, product detail caching
- MongoDB: Product catalog — each category has different attributes, flexible schema
- Elasticsearch: Full-text product search, multi-faceted filtering
To make a quick decision, I usually ask myself 3 questions:
- Is the data structure fixed? — If yes → SQL
- Do you need JOINs or multi-table transactions? — If yes → SQL
- Do you need high write throughput or horizontal scaling? — If yes → NoSQL
Still unsure after all three? Start with PostgreSQL. It supports JSONB for flexible parts, has built-in full-text search, and handles most workloads at medium scale. If you need to scale later, migrating parts to NoSQL is far easier than refactoring an entire schema.
That’s also the lesson from that 100GB migration: if I had used PostgreSQL with JSONB for the flexible parts and a proper schema for the constrained parts from the start, there would have been nothing to migrate — saving at least 4 days of work.

