File Compression: More Than Just Saving Space
If you’ve ever managed a server, you’ve likely seen the hard drive warning go red (99%) because of bloated log files or the need to transfer dozens of GBs of data between servers. In these moments, file compression is the only lifesaver for freeing up space and saving bandwidth during transfer.
Back when I was a junior sysadmin, I once compressed a 100GB backup folder using xz directly on a production server. The result was a CPU spike to 100%, and the server lagged so much that I couldn’t even SSH into it. I spent the whole afternoon fixing it just because I didn’t understand the characteristics of each tool. Every format (gzip, bzip2, xz, zstd) has a trade-off between speed and compression ratio.
Before we begin, you need to distinguish between two concepts:
- Archiving: Bundling multiple files into a single file using
tar. - Compression: Using algorithms to reduce the size of that file.
Installing Necessary Tools
Most distributions like Ubuntu or CentOS come with tar and gzip pre-installed. However, powerful tools like xz or zstd (from Facebook) often need to be installed manually.
# For Ubuntu/Debian
sudo apt update && sudo apt install tar gzip bzip2 xz-utils zstd -y
# For RHEL/CentOS/AlmaLinux
sudo dnf install tar gzip bzip2 xz zstd -y
Details of Popular Compression Formats
1. The tar Command – The “Backbone” of Archiving
tar (Tape Archive) is the most basic tool. By nature, it only bundles files into a single block without reducing size. However, it can be combined with other compressors via flags.
# Archive without compression (create .tar file)
tar -cvf backup.tar /path/to/folder
# Extract
tar -xvf backup.tar
Flags to remember: -c (Create), -x (Extract), -v (Verbose – view progress), and -f (File).
2. Gzip (.tar.gz) – Balanced and Universal
This is the “go-to” standard. If you need to compress quickly to send files to colleagues or for periodic backups, use the -z flag.
# Fast compression with gzip
tar -czvf data.tar.gz /data
Reality: A 1GB log file can be compressed to about 200MB in just 30 seconds. Gzip consumes very little CPU resources.
3. XZ (.tar.xz) – The Champion of Compression Ratios
When storage space is the top priority, xz (flag -J) is the number one choice. It is often used for packaging source code or long-term backups.
tar -cJvf data.tar.xz /data
Warning: Don’t use xz on weak servers. To compress that same 1GB file mentioned above, xz might shrink it to 100MB, but it could take 5 minutes and consume all available RAM.
4. Zstd (.tar.zstd) – The Modern Star
Zstandard is my preferred tool these days. It is extremely flexible, allowing for custom compression levels and supporting multi-core CPUs.
# Compress with zstd
tar --zstd -cvf data.tar.zst /data
The main selling point of zstd is its extremely fast decompression speed, almost equivalent to reading an uncompressed file.
Real-world Performance Comparison
Here is a comparison table based on my practical system administration experience:
| Format | Speed | Compression Ratio | Use Case |
|---|---|---|---|
| Gzip | Fast | Fair (5:1) | Daily backups, web servers. |
| XZ | Very slow | Best (10:1) | Long-term archiving. |
| Zstd | Very fast | Good (7:1) | Big data, Databases, Real-time. |
Tips for Monitoring Progress with Large Files
The tar command does not show a progress bar by default. When compressing files hundreds of GBs in size, you won’t know when it will finish. Use pv (Pipe Viewer) to solve this:
# Install pv and compress with a progress bar
tar -cf - /path/to/data | pv -s $(du -sb /path/to/data | awk '{print $1}') | gzip > data.tar.gz
At this point, the screen will clearly display the speed (MB/s) and estimated completion time. Very professional!
Conclusion: Which One Should You Choose?
Each tool has its own strengths depending on your server resources:
- Need speed on a weak server: Choose
gzip. - Want maximum space savings:
xzis the champion. - Modern server with many CPU cores: You should definitely use
zstd.
I hope this article helps you feel more confident when choosing a file compression method. Don’t forget to test in a staging environment before performing heavy tasks on production!
