Storage Management on VMware ESXi: Don’t Wait for the “Red Disk” Warning to Panic – ITFROMZERO

Table of Contents

The 3 AM Nightmare: When the Disk Turns Red

System Admins and DevOps engineers are no strangers to the scenario of being fast asleep only to be jolted awake by a Zabbix alert: “Disk Space Low” on a Production cluster. That feeling is… painful. With VMware ESXi, storage management isn’t as simple as just plugging in a new drive. It involves a series of operations from the physical layer and the virtualization layer (Datastore/VMDK) all the way to the OS partitions.

When I first started out, my hands would shake when resizing a 2TB Database Server disk. My biggest fear was accidentally losing data or crashing a client’s virtual machine. Later, after working with various platforms like Proxmox and Hyper-V, I still find VMware’s storage management to be very robust. As long as you understand the fundamentals, everything remains under control.

Core Concepts: Datastore and Virtual Disk (VMDK)

Before we start running commands, let’s clarify a few concepts to avoid confusion while using the vSphere Client.

1. What is a Datastore?

Think of a Datastore as a massive warehouse. It abstracts the complexity of the underlying hardware, whether you’re using local drives, RAID arrays, or SAN/NAS. VMware utilizes the VMFS (Virtual Machine File System). This specialized format allows multiple hosts to read and write to the same disk simultaneously without conflicts.

2. Virtual Disk (VMDK) – The VM’s “Slice of the Pie”

This is the virtual disk file seen by the guest OS. There are three main formats you should know:

Thin Provisioning: Efficient and flexible; it only consumes space as needed. The initial VMDK file is very small. However, if you allocate 500GB but the Datastore only has 100GB left, the VM will crash the moment it exceeds that physical limit.
Thick Provision Lazy Zeroed: Allocates the full capacity immediately. Old data on the physical drive is only cleared when the VM performs its first write operation. Creation is fast, but initial write performance is slightly lower.
Thick Provision Eager Zeroed: The highest performance but also the most time-consuming to create. It wipes all old data and occupies the full space immediately upon creation. For high I/O applications like SQL Server (over 5000 IOPS), this is mandatory.

Hands-on: Upgrading Capacity from Top to Bottom

Assume you’ve just added a 1TB SSD to your server and want to increase the capacity for an Ubuntu Web Server. The process consists of three definitive steps.

Step 1: Expanding the Datastore (Physical Layer)

After plugging in the drive, navigate to Storage > Datastores > Increase capacity. You will see two options:

Add extent: Combines a new drive (new LUN) into the existing Datastore. This is risky because if either drive fails, the entire Datastore is lost.
Expand existing extent: If you’ve just resized a hardware RAID array, choose this to consume the newly added free space.

To verify via CLI, SSH into your ESXi host and use these commands:

# List storage devices
esxcli storage core device list

# Check actual Datastore capacity
vmkfstools -P /vmfs/volumes/DATASTORE_NAME

Step 2: Expanding the Virtual Disk (VMDK)

Once the warehouse is expanded, we allocate more “meat” to the VM. Go to Edit Settings, find the Hard Disk section, and enter the new size. For example, from 100GB to 250GB.

Warning: If the VM has active Snapshots, the capacity adjustment field will be grayed out. You must delete all snapshots before resizing. Do not try to force it; a corrupted VMDK file is often unrecoverable.

If you want to use the CLI like a pro:

# Increase VMDK file size to 250GB
vmkfstools -X 250G /vmfs/volumes/DATA_01/WebSrv/WebSrv.vmdk

Step 3: Getting the OS to Recognize the New Capacity

The VM now sees a larger disk, but the internal partitions remain unchanged. On Windows, simply use diskmgmt.msc and select Extend Volume. For Linux (LVM), execute these “magic” commands:

# 1. Rescan the hard drive to detect new capacity
echo 1 > /sys/class/block/sda/device/rescan

# 2. Resize the Physical Volume (assuming the partition is sda3)
pvresize /dev/sda3

# 3. Expand the Logical Volume (LV) using 100% of the free space
lvextend -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv

# 4. Finalize by resizing the file system
# If using EXT4: resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
# If using XFS: xfs_growfs /

Hard-won Lessons from the Field

After many “painful” experiences, I’ve established a few non-negotiable rules:

1. The 80/20 Rule: Never exceed 80% of your Datastore’s capacity. When it hits 100%, ESXi will “suspend” all virtual machines. At that point, you won’t even be able to power them on to delete files or perform maintenance.

2. The UNMAP Trick: With SSDs and Thin Provisioning, deleting files within the VM doesn’t immediately shrink the Datastore usage. On ESXi 6.7 and above, use this command to reclaim free space:

esxcli storage vmfs unmap -l MyDatastore

3. Don’t Over-complicate: Many people over-hyped Thick Eager Zeroed. In reality, for standard web servers, using Thin Provisioning saves you a significant amount of storage costs with negligible performance loss.

Conclusion

Storage management is a balancing act between budget and performance. Don’t be too rigid. Base your disk format choices on the actual application load. Most importantly, have a solid monitoring system in place so you never have to receive alerts in the middle of the night.

I hope these insights help you feel more confident when “wielding commands” on VMware. If you encounter any tricky disk issues, feel free to leave a comment below!