PostgreSQL Indexing: Don’t Just Use B-Tree for Millions of Rows

Database tutorial - IT technology blog
Database tutorial - IT technology blog

Context: When Indexing Isn’t a Magic Default

Working on real-world projects, from E-commerce on MySQL to Big Data systems on PostgreSQL, I’ve learned a hard-earned lesson: Indexing is the soul of performance.

Imagine a table with 100 million rows without an index. It’s like looking for a needle in a haystack in a massive library with no catalog. The result? CPU spikes to 100%, Disk I/O bottlenecks, and your application hangs until timeout.

But the biggest mistake many developers make is sticking to the default CREATE INDEX. PostgreSQL provides a specialized “arsenal” like GIN, GiST, and BRIN. Choosing the wrong index type not only slows down writes (INSERT/UPDATE) but also wastes tens of gigabytes of disk space.

Deployment: Survival Rules for Production

Creating an index in Postgres is syntactically easy but extremely risky on a highly active table.

-- The index creation method that "destroys" systems (Table Lock)
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Safe index creation (Recommended for Production)
CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at);

Pro-tip: Always use the CONCURRENTLY keyword. It allows index creation without preventing users from placing orders or making payments. While it takes slightly longer, at least you won’t get a call from your boss for crashing the app during peak hours.

Classification & Use Cases: Choosing the Right Weapon

Every index type is designed for a specific problem. Don’t use a sledgehammer to crack a nut.

1. B-Tree Index: The Versatile “Swiss Army Knife”

This is the default type. It excels at range searches and direct comparisons like <, =, and >.

When to use: 90% of cases involving IDs, emails, dates, or ORDER BY. If your data is flat and simple, B-Tree is the top choice.

2. GIN (Generalized Inverted Index): A Lifesaver for JSONB

If you’re storing flexible information in JSONB columns or Arrays, B-Tree is completely useless. GIN indexes deep into every key/value within that complex data structure.

Real-world numbers: In a project managing 500,000 products with dynamic attributes, using GIN reduced query filtering by color and size from 2.8 seconds to exactly 12ms.

-- High-performance optimization for JSONB columns
CREATE INDEX idx_products_metadata ON products USING GIN (metadata jsonb_path_ops);

3. BRIN (Block Range Index): The Big Data Secret Weapon

BRIN is a “heavy-duty weapon” for log tables with billions of rows. Instead of indexing every single row, it only stores the min/max values for each data block.

Specific example: For a 500GB log table, a B-Tree index could take up 40-60GB. Switching to BRIN reduces the index size to about 150MB while maintaining extremely fast time-based searches.

CREATE INDEX idx_huge_logs_ts ON big_logs USING BRIN (created_at);

4. Hash Index: For Equality Comparisons Only

Hash Index only supports the = operator. Since Postgres 10, it has become stable and slightly faster than B-Tree for unique value columns (like UUIDs), but it remains less common than B-Tree.

Testing & Optimization: Don’t Trust Your Gut

Creating an index isn’t the end of the story. Sometimes Postgres will ignore your index if it decides a Sequential Scan (Seq Scan) is faster.

Understanding Execution with EXPLAIN ANALYZE

Always inspect how your SQL is running using the following command:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';

If you see Index Scan, you’re on the right track. If you see Seq Scan on a table with millions of rows, double-check your data types or verify if the index was created successfully.

Cleaning Up “Garbage” Indexes

Indexes are a double-edged sword. Too many indexes will tank your INSERT performance because the database must update all of them simultaneously. Run this script to find unused indexes and confidently delete them:

SELECT relname, indexrelname, idx_scan 
FROM pg_stat_user_indexes 
WHERE idx_scan = 0 AND idx_unique = false;

In summary, optimizing PostgreSQL isn’t magic—it’s about understanding your data. If the table is small, B-Tree is enough. If the data is massive, consider BRIN. If you’re working with JSONB, GIN is mandatory. Happy database mastering!

Share: