Is Your PostgreSQL Disk Bloating? Pro Tips to Fix Bloat and Optimize Autovacuum – ITFROMZERO

Table of Contents

5-Minute Disk Check and “Emergency” Fix

You receive a server disk alert hitting 90-95% in the middle of the night. A quick check shows that actual data (rows) only takes up about 20GB, yet the database file on disk weighs in at 100GB. This is a classic sign of Bloat. Don’t rush to run VACUUM FULL immediately. This command locks the entire table, causing all application queries to hang.

Start by identifying which tables contain the most “garbage” (dead tuples) using this SQL command:

SELECT 
    relname AS table_name, 
    n_dead_tup AS dead_tuples, 
    last_autovacuum, 
    last_vacuum 
FROM pg_stat_user_tables 
ORDER BY n_dead_tup DESC;

If a table has millions of dead_tuples while last_autovacuum last ran… a week ago, you’ve found the culprit. To resolve this temporarily without disrupting the system, run:

VACUUM (ANALYZE) your_table_name;

Note: This command only cleans up to make room for new data inside the table. It does not return disk space to the operating system immediately.

Why Doesn’t Disk Usage Decrease After Deleting Data?

The MVCC (Multi-Version Concurrency Control) mechanism is the soul of PostgreSQL, but it’s also the reason for Bloat. When you UPDATE a row, Postgres doesn’t modify it directly. It marks the old row as “expired” (a dead tuple) and inserts a brand-new row. The DELETE command works similarly, marking rows without physically removing them immediately.

This approach ensures other transactions can still see the old data, maintaining consistency. However, Bloat occurs when the rate of “garbage” creation exceeds the cleanup speed. Data files swell with useless empty space. As a result, SELECT queries slow down significantly because the database must scan through a “mountain of trash” to find actual data.

Real-world Experience: When Default Autovacuum is Too “Timid”

I once handled a logging system with a frequency of 3,000 TPS (Transactions Per Second). Although Postgres has built-in Autovacuum, its default configuration is often too conservative. It runs slowly and stops early to save CPU. Consequently, a 60GB table ended up with 40GB of bloat.

Don’t stick with default parameters if your system has high write intensity. Adjust your postgresql.conf file to make Autovacuum more aggressive.

Configuring Autovacuum for Effective Cleanup

Here are the parameters I usually prioritize for tuning:

# Increase memory for cleanup (e.g., 512MB for a 16GB RAM server)
maintenance_work_mem = 512MB 

# Increase the number of concurrent workers
autovacuum_max_workers = 5 

# Trigger cleanup when 5% of data changes (default is 20% - too high)
autovacuum_vacuum_scale_factor = 0.05 

# Reduce delay to clean garbage faster
autovacuum_vacuum_cost_limit = 1000

After saving, apply the new configuration with the command:

SELECT pg_reload_conf();

Reclaiming Space Without Downtime Using pg_repack

Normally, VACUUM only collects garbage to reuse that space within the same table. To actually return space to the OS, you need VACUUM FULL. But as mentioned, this locks the table for a long time. If a table is 200GB, the system could be “clinically dead” for several hours.

The best solution is pg_repack. This extension helps restructure tables and indexes without long-duration locks. It creates a temporary table, copies the data over, and then swaps the names extremely quickly.

How to Install pg_repack (Ubuntu/Debian)

sudo apt-get install postgresql-15-repack
# Then enter the database to create the extension
CREATE EXTENSION pg_repack;

To run a cleanup for a specific table:

pg_repack -d your_db_name -t your_table_name

While pg_repack is running, the application can still INSERT or UPDATE normally. This is a lifesaver for Production environments that need to operate 24/7.

Don’t Forget Index Bloat

Not just tables, indexes also bloat. In fact, some indexes swell to three times the size of the actual data. The safest way to handle Index Bloat is using REINDEX CONCURRENTLY:

REINDEX INDEX CONCURRENTLY idx_customer_email;

This command creates a new index running in parallel. Once complete, it automatically replaces and deletes the old index. You will see disk space decrease significantly without interrupting users.

Advice for Database Administrators

Proactive Monitoring: Use Grafana or Prometheus to track n_dead_tup. Don’t wait for a red disk alert to start acting.
Avoid Long-Running Transactions: A transaction hanging from morning to night will block Autovacuum. Garbage cannot be cleaned if an old transaction is still “seeing” it.
Prioritize External Tools: Always keep pg_repack in your maintenance toolkit. It’s much more professional and safer than default commands when tables are too large.
Check Data Types: Sometimes Bloat is a result of poor design. Overusing UUID or unnecessary TEXT types increases the size of each data row.

Understanding the MVCC mechanism will help you control PostgreSQL better. Handling Bloat isn’t difficult; the key is choosing the right time and tools to avoid affecting user experience.