Configuring VDO on Fedora: How I “Squeezed” 500GB of Data into a 100GB Drive

Fedora tutorial - IT technology blog
Fedora tutorial - IT technology blog

The 2 AM Storage Full Nightmare

At exactly 2 AM on a Tuesday, my phone wouldn’t stop vibrating. The monitoring system alerted: The backup drive for a client’s database cluster had reached 98% capacity. At the current log generation rate, all write processes would freeze in just 30 minutes. Half-asleep, I faced two choices: either pay for more cloud storage (both expensive and time-consuming for approval), or find a way to “compress” the existing data.

Having used Fedora as my main OS for a long time, I trust its system tools. Deleting logs was impossible due to security regulations. Compressing with gzip would make future data retrieval as slow as a turtle. Finally, I chose VDO (Virtual Data Optimizer). This is a secret tool in the Fedora/RHEL ecosystem that allows for extremely smart data processing at the block level.

VDO vs. Btrfs vs. ZFS: Which Side to Choose for Storage Optimization?

Before typing any commands, I considered three popular options. Each has its strengths, but they aren’t always suitable for a production environment that requires stability.

  • Btrfs: This is the default on Fedora and supports compression quite well. However, its deduplication feature still doesn’t run automatically at the block level. You often have to run scanning tools periodically, which is quite tedious.
  • ZFS: A giant in the storage world. ZFS handles compression and dedupe excellently but faces kernel licensing issues on Linux. Every time Fedora updates its kernel, I hold my breath fearing the ZFS module might break.
  • VDO: Acts as a layer beneath the file system. It compresses and deduplicates data as it is being written (inline). Since it works at the block level, you can freely use XFS or Ext4 on top without worrying about conflicts.

The Price of Saving Space

VDO is not a magic wand that comes for free. Through actual deployment, I’ve noted two key points to consider:

  • RAM Consumption: VDO needs about 1GB of RAM for every 1TB of physical capacity to manage the index. If your server only has 2GB of RAM for all services, it’s best to skip VDO.
  • Latency: Because it has to calculate hashes to check for duplicates before writing, write speeds will drop by about 10-15% compared to a raw drive.

Why VDO is a “Lifesaver” for Server Logs?

Backup data often contains many identical files, differing only by a few log lines. With VDO, if you have 10 snapshots of a 10GB database, instead of taking up 100GB, you might only use about 12-15GB of actual space. This safely transforms my 100GB physical drive into a virtual space of up to 500GB.

VDO Deployment Process on Fedora Server

The steps below were performed directly on a running server. Always remember to back up important data before interfering with disk structures.

Step 1: Install Packages

On minimal Fedora versions, VDO tools are often omitted. Install them using the following command:

sudo dnf install vdo kmod-kvdo -y

Once installed, you need to enable the service to start automatically with the system:

sudo systemctl enable --now vdo

Step 2: Prepare the Disk

Suppose I have an empty disk at /dev/sdb. Use the lsblk command to verify the drive name and avoid formatting the OS drive by mistake.

lsblk

Step 3: Create a Virtual Volume

I am creating a VDO device named vdo_storage. Here, I set the virtual capacity to 200GB even though the physical drive is only 50GB (a 4:1 ratio).

sudo vdo create --name=vdo_storage \
               --device=/dev/sdb \
               --vdoLogicalSize=200G

Step 4: Set up the File System

Now, the new device is located at /dev/mapper/vdo_storage. I use XFS because of its excellent large-file management capabilities on Fedora.

sudo mkfs.xfs -K /dev/mapper/vdo_storage

The -K flag skips the discard command, saving time when formatting large drives.

Step 5: Configure Auto-Mount

To ensure the drive re-mounts after a reboot, add this line to /etc/fstab. Note: The x-systemd.requires=vdo.service parameter is mandatory to avoid boot hang errors.

/dev/mapper/vdo_storage /mnt/vdo_data xfs defaults,x-systemd.requires=vdo.service 0 0

Checking the Actual Results

After pushing 20GB of duplicate logs into it, the df -h command still reports 20GB used. However, when checking with the specialized tool:

sudo vdostats --human-readable

I breathed a sigh of relief. The actual space occupied on the physical disk was only 4.2GB. That’s a savings of nearly 80% in storage space!

Three Critical Lessons When Using VDO

  1. Restrain your greed: Don’t set the virtual capacity too high (like a 10:1 ratio). If the physical drive fills up while the virtual capacity still shows space, the system will throw extremely frustrating I/O errors.
  2. Forget the df command: Always use vdostats to know how much physical space you actually have left. df -h is only for reference at this point.
  3. Backup is still the #1 priority: VDO only helps save disk space; it doesn’t protect data from physical failure. Always have an external backup plan.

VDO is an excellent solution for solving storage problems without needing immediate hardware upgrades. If you are running services with high log volumes or backups, try implementing it now to see the difference.

Share: