The Ultimate Fix for I/O Lag: Converting VMware Thin to Thick Provisioning – ITFROMZERO

Table of Contents

Real-World Story: When VMs Struggle to Breathe Because of the Hard Drive

In 2022, while running an 8-node ESXi cluster for an ERP system, I encountered a tricky case. A SQL Server suddenly started lagging crazily, with I/O latency spiking to 50ms while it usually stayed under 5ms. After a careful check, the storage system (SAN) was still idle. The culprit turned out to be the Thin Provisioning disk format of that VM.

The issue with Thin disks is that they only consume space on the SAN when there is actual data. It sounds cost-effective, but it actually consumes a lot of processing resources. Every time the VM needs to write data to a new block, the Hypervisor has to “work hard” to find an empty block, zero out old data, and then allocate it. This metadata latency “throttles” the performance of applications that require continuous data writing.

Here is how I permanently resolved this situation by switching between disk formats to optimize I/O performance.

Quick Comparison of 3 Disk Types in VMware

To choose the right ‘medicine’ for the disease, you need to understand the characteristics of each type:

Thin Provisioning: Most space-efficient, pay as you go. However, it causes about 5-10% CPU overhead due to dynamic allocation. Suitable for Lab machines or light Web Servers.
Thick Provisioning Lazy Zeroed: Space is pre-allocated but not cleaned. It only performs a zero-out on blocks when the VM writes data for the first time. Performance is slightly more stable than Thin.
Thick Provisioning Eager Zeroed: This is the #1 choice for Databases. It allocates the entire capacity and wipes blocks clean from the start. Although initialization takes time, it offers extremely stable I/O performance with no allocation latency.

Converting from Thin to Thick (Boosting Performance)

When you see a VM showing signs of I/O ‘exhaustion,’ consider switching to Thick immediately. There are two methods I usually apply depending on the infrastructure conditions.

Method 1: Using the Inflate Feature (Simplest)

If you’re hesitant to use the command line, this is your lifesaver. Small note: You must power off the VM to perform this.

Open vSphere Client, find the VM in need of ’emergency care.’
Access the Datastore Browser, navigate to the correct folder containing the VM files.
Right-click the .vmdk file (usually the largest file).
Select Inflate and wait for the system to run.

Once completed, the disk will automatically convert to Lazy Zeroed format, eliminating latency caused by virtual disk expansion.

Method 2: Using the vmkfstools Command (Advanced)

I prefer this method because it allows a direct conversion to Eager Zeroed for maximum performance. First, enable SSH on the ESXi host and log in via Putty or Terminal.

Move into the folder containing the VM:

cd /vmfs/volumes/DATASTORE_ID/VM_FOLDER/

Proceed to clone the old disk to a new format:

# Convert to Thick Eager Zeroed extremely fast
vmkfstools -i original.vmdk -d eagerzeroedthick new_thick_disk.vmdk

Finally, just go to Edit Settings of the VM, remove the old disk, and point it back to the new_thick_disk.vmdk file; the machine will run smooth as silk.

Converting from Thick to Thin (When Storage Calls for Help)

In practice, sometimes Storage reports red (exceeding the 90% threshold), and I am forced to reclaim space from less important VMs.

Storage vMotion: The Non-Disruptive Solution

If the system has vCenter and the appropriate license, you can do this without shutting down the machine. The process is very ‘chill’:

Right-click the VM, select Migrate.
Select Change storage only.
In the disk format section, choose Thin Provision.
Select another Datastore (or itself if there is enough temporary space).

VMware will automatically ‘squeeze out the water’ and only keep the portion that actually contains data.

Hard-Earned Lessons After Years of Operation

Through many troubleshooting incidents on large Clusters, I’ve drawn 3 valuable lessons:

Don’t let things ‘break’ because of Overprovisioning: Thin Provisioning is very deceptive. You create 10 VMs of 500GB on a 1TB disk and it seems fine, but when they all write data simultaneously, the entire system will ‘freeze’ immediately. Always set warning Alarms at the 85% level.
Double-check backups: Manipulating .vmdk files always carries risks. No matter how confident you are, ensure the latest backup is still working fine.
Avoid peak hours: Creating an Eager Zeroed disk consumes a lot of Storage I/O. Don’t be foolish enough to do this at 9 AM on a Monday unless you want to receive a ‘basket’ of complaint tickets from users.

Conclusion

Optimizing I/O is not just about buying better hard drives. Sometimes just a small change from Thin to Thick Eager Zeroed is enough for the system to run much smoother. I hope these insights help you feel more confident in managing your virtualization infrastructure.