ZFS Storage for Proxmox VE: Setting Up Pools, Snapshots, and Checksum-Based VM Data Protection – ITFROMZERO

Storage is the silent weak point of Proxmox that most people overlook

I run a homelab with Proxmox VE managing 12 VMs and containers — it’s my playground for testing everything before pushing to production. Early on, I used the default LVM-thin setup that the Proxmox installer creates automatically. That worked fine, until one NVMe drive developed bad sectors without me noticing — silent corruption crept in, the VM booted up, but the database inside kept throwing weird errors I couldn’t trace back to anything. That’s when I started taking ZFS seriously.

Comparing storage approaches for Proxmox VE

Proxmox supports quite a few storage backends. Before picking one, it’s worth understanding the trade-offs of each:

LVM / LVM-Thin (default)

The Proxmox installer sets up LVM-Thin without asking — you boot up and it’s ready to go. LVM-Thin supports thin provisioning and snapshots, but has no checksumming and no self-healing. If a sector goes bad, you won’t find out until the VM crashes or data has already been silently corrupted.

Pros: Simple, low overhead, well-supported out-of-the-box
Cons: No data integrity checking, snapshots slow down on large volumes

Directory (ext4/xfs)

Stores VM disks as raw or qcow2 files on a standard filesystem. Easy to back up with rsync, no extra setup needed. But qcow2 snapshots are slow and the files tend to fragment over time, hurting I/O performance.

ZFS (zvol + dataset)

A filesystem with an integrated volume manager, checksumming, snapshots, compression, and scrubbing built directly into the storage layer. ZFS doesn’t need a hardware RAID controller — it manages redundancy and data integrity on its own.

Pros: Checksums every block, instantaneous atomic snapshots, native lz4 compression, scrub catches errors early
Cons: High RAM usage (1GB RAM per 1TB storage is the minimum recommendation), pools cannot be shrunk after creation

Ceph (for multi-node clusters)

Distributed storage, designed for clusters of three or more nodes. Setting up Ceph on a single server is not worth the effort — skip this option unless you have at least three machines.

Where ZFS shines, and where it falls short

ZFS doesn’t solve every storage problem. But when running Proxmox with critical data, there are two things it does better than any other option:

Silent corruption detection: Every block is checksummed (SHA-256 or blake3). On read, ZFS verifies the checksum — if it doesn’t match, it knows the block is bad, and with a mirror or RAIDZ it will automatically repair it from a healthy copy.
Atomic snapshots: ZFS snapshots complete in milliseconds regardless of whether the volume is 100GB or 2TB. No long VM freezes, no noticeable I/O impact.

With LVM-thin, I only found out something was wrong when a VM crashed or the database threw strange errors. ZFS scrub runs weekly and writes directly to the log — you know about a problem before damage occurs, not after.

When should you choose ZFS for Proxmox?

Choose ZFS if the following conditions are met:

The server has 16GB of RAM or more (ZFS ARC cache is RAM-hungry but well worth it)
You need fast snapshots to support backup workflows or testing
The data is important and you want to catch hardware failures before it’s too late
You want native compression to save disk space without sacrificing performance

With less than 8GB of RAM, ZFS will hurt performance rather than improve it. If you ever need to shrink a pool, avoid ZFS — once a pool is created, there’s no way to make it smaller.

Deploying ZFS on Proxmox VE: step by step

Proxmox VE 7+ ships with the ZFS kernel module. If you installed Proxmox on an ext4 root (without selecting ZFS during install), you’ll need to add the following package:

Step 1: Check and install the ZFS module

# Check if ZFS is already available
zfs version

# If not, install it
apt update && apt install zfsutils-linux

# Load the kernel module
modprobe zfs

# Confirm the module is loaded
lsmod | grep zfs

Step 2: Identify the disks to use for the pool

# List all disks
lsblk -o NAME,SIZE,TYPE,MOUNTPOINT

# Use stable disk IDs instead of /dev/sdX
ls -l /dev/disk/by-id/ | grep -v part

Always use disk by-id instead of /dev/sdX — drive ordering can change after a reboot and corrupt the pool.

Step 3: Create the ZFS pool

Assuming two 1TB drives, I’ll create a mirror pool (equivalent to RAID-1) for both redundancy and better read performance:

# Create a mirror pool using disk IDs (safest approach)
zpool create -f rpool_vm mirror \
  /dev/disk/by-id/ata-WDC_WD10EZEX-xxxxxxxx \
  /dev/disk/by-id/ata-WDC_WD10EZEX-yyyyyyyy

# If you have 3 drives, use RAIDZ1 (similar to RAID-5 but without the write hole)
# zpool create -f rpool_vm raidz1 /dev/disk/by-id/... /dev/disk/by-id/... /dev/disk/by-id/...

# Confirm the pool was created
zpool status rpool_vm
zpool list

Step 4: Configure properties optimized for VM workloads

# Enable lz4 compression (fast, saves ~20-40% disk space)
zfs set compression=lz4 rpool_vm

# Disable atime (reduces unnecessary write overhead)
zfs set atime=off rpool_vm

# Confirm properties were applied
zfs get compression,atime rpool_vm

Step 5: Add the ZFS pool to Proxmox VE

In the Proxmox web UI, go to Datacenter → Storage → Add → ZFS and fill in:

ID: the storage name in Proxmox (e.g., zfs-vm)
ZFS Pool: rpool_vm
Content: Disk image, Container
Block Size: 8K (suitable for most VM workloads)

Or add it quickly via CLI:

pvesm add zfspool zfs-vm --pool rpool_vm --content images,rootdir

Leveraging ZFS snapshots for VMs

Each VM disk in Proxmox on ZFS is stored as a zvol. ZFS snapshots operate at the block layer, making them extremely fast:

Creating and rolling back snapshots manually

# List zvols for all VMs
zfs list | grep vm-

# Snapshot VM 100 before a system update
zfs snapshot rpool_vm/vm-100-disk-0@before-update-2024-01

# List all snapshots
zfs list -t snapshot

# Roll back when something goes wrong (VM must be stopped first)
qm stop 100
zfs rollback rpool_vm/vm-100-disk-0@before-update-2024-01
qm start 100

Automated daily snapshots

#!/bin/bash
# /usr/local/bin/zfs-auto-snapshot.sh
# Runs daily, keeps the 7 most recent snapshots

POOL="rpool_vm"
DATE=$(date +%Y%m%d-%H%M)
KEEP=7

# Create a snapshot for every zvol in the pool
for zvol in $(zfs list -H -o name -t volume | grep "^${POOL}/vm-"); do
    zfs snapshot "${zvol}@auto-${DATE}"
    echo "Snapshot: ${zvol}@auto-${DATE}"
done

# Delete old snapshots, keeping the $KEEP most recent per zvol
for zvol in $(zfs list -H -o name -t volume | grep "^${POOL}/vm-"); do
    snapshots=$(zfs list -H -o name -t snapshot | grep "^${zvol}@auto-" | sort)
    count=$(echo "$snapshots" | wc -l)
    if [ "$count" -gt "$KEEP" ]; then
        echo "$snapshots" | head -n $((count - KEEP)) | xargs -I{} zfs destroy {}
    fi
done

# Add to crontab: run at 2 AM every day
crontab -e
# Add the following line:
0 2 * * * /usr/local/bin/zfs-auto-snapshot.sh >> /var/log/zfs-snapshot.log 2>&1

Checksums and scrubbing — the data integrity protection layer

ZFS automatically verifies checksums on every block read. But for data that’s rarely accessed (cold data), periodic scrubbing is what scans the entire pool:

# Run a scrub to check the entire pool
zpool scrub rpool_vm

# Monitor progress
zpool status rpool_vm

# Sample output when scrub completes
# scan: scrub repaired 0B in 00:04:21 with 0 errors on Sun Mar 15 03:00:02 2026

Zero errors means the pool is clean. When errors are detected, ZFS with a mirror or RAIDZ automatically repairs from the healthy copy and logs it — you just read the log instead of debugging a crash. With LVM-thin, that scenario ends with a VM that won’t boot.

# Auto-scrub every Sunday at 3 AM
# Add to crontab:
0 3 * * 0 zpool scrub rpool_vm

# Check actual compression ratio
zfs get compressratio rpool_vm

# Monitor I/O in real time
zpool iostat rpool_vm 2

Real-world results after switching to ZFS

After migrating my homelab from LVM-thin to a ZFS mirror with lz4 compression, I’m seeing compression ratios of around 1.4x–1.8x depending on the VM — VMs handling lots of text and logs benefit the most. Snapshots are instantaneous instead of taking several minutes. What I value most: the weekly scrub consistently reports 0 errors, and storage is self-monitoring data integrity — no more crossing my fingers the way I did with LVM-thin.

With enough RAM on the server and important data on the line, this is an upgrade I couldn’t afford not to make. No additional hardware cost — just the time to set it up properly once.