VMware ESXi Graceful Shutdown on Power Failure: Proper Configuration to Prevent VM Corruption

VMware tutorial - IT technology blog
VMware tutorial - IT technology blog

I manage a VMware cluster with 8 ESXi hosts at work, and the worst incident happened the very first time the UPS ran out of battery before I had configured graceful shutdown — 3 database VMs ended up with corrupted files, and recovery took 6 hours. Ever since, configuring graceful shutdown is the first thing I do after every fresh ESXi installation, without exception.

Do It in 5 Minutes: Enable Graceful Shutdown on ESXi

Just 2 steps to get basic graceful shutdown working. The advanced part (UPS integration) can come later.

Step 1: Verify VMware Tools on All VMs

Without VMware Tools, ESXi cannot send the shutdown command to the guest OS — it will cut power immediately. SSH into the ESXi host and run:

# Check VMware Tools status for all running VMs
for vmid in $(vim-cmd vmsvc/getallvms | awk 'NR>1 {print $1}'); do
  status=$(vim-cmd vmsvc/tools.getStatus $vmid 2>/dev/null \
    | grep toolsStatus | awk '{print $3}')
  name=$(vim-cmd vmsvc/getallvms | awk -v id=$vmid '$1==id {print $2}')
  echo "VM $vmid ($name): $status"
done

The expected output is toolsOk for all VMs. Any VM showing toolsNotInstalled or toolsNotRunning will be force-powered-off during a power failure — install or restart VMware Tools on those VMs before proceeding.

Step 2: Configure Autostart/Shutdown in ESXi

Access the ESXi embedded host client at https://<esxi-ip>/ui (or vSphere Client if you have vCenter):

  1. Select HostManageSystemAutostart
  2. Click Edit Settings
  3. Set EnabledYes
  4. Set Stop actionGuest Shutdown — important: do NOT select Power Off
  5. Set Stop delay: 120 seconds (wait time for each VM to complete shutdown)
  6. Click OK

Then go to the Virtual Machines tab, add each VM to the list, and set the priority order. Done — graceful shutdown will now work when an admin shuts down the host or when the UPS sends a shutdown command.

Understanding Graceful vs. Force Shutdown

ESXi has two ways to shut down a VM, and the difference isn’t just technical — it determines whether your data survives or gets corrupted:

  • Power Off: Cuts power immediately, like yanking the cable. The file system has no time to flush buffers, in-progress database transactions are lost, and Windows NTFS may require chkdsk on next boot.
  • Guest Shutdown: Sends a shutdown signal to the OS through VMware Tools and waits for the OS to clean up and shut down properly. Completely safe for the file system and database.

The time I experienced an unexpected Power Off, the MySQL VM needed 20 minutes of InnoDB recovery on restart — and that was actually the lucky case. There have been times when corruption couldn’t self-repair and a full restore from backup was the only option.

VM Shutdown Order Matters More Than You Think

Most production stacks have a dependency chain — app servers need the database, message queues need running consumers. Shut down in the wrong order and you’ll have severed connection pools and an ugly error log the next morning. The order I use:

  1. App servers, Web servers (shut down first)
  2. Cache servers (Redis, Memcached)
  3. Message queue (RabbitMQ, Kafka)
  4. Database servers — MySQL, PostgreSQL (shut down last)
  5. Infrastructure VMs such as AD, DNS (shut down last if present)

In ESXi Autostart, a lower order number = starts earlier = shuts down later. Set your database VMs to Order 1 so they are the last to shut down.

Advanced: Integrating UPS for Automatic Shutdown Triggering

Configuring graceful shutdown in ESXi only solves how to shut down. The remaining piece is who tells ESXi to shut down when power fails — that’s the job of UPS management software.

Option 1: APC PowerChute Network Shutdown

If you’re using an APC UPS with a Network Management Card (NMC), PowerChute Network Shutdown (PCNS) is the most stable and lowest-configuration option. Install PCNS on a Linux VM within the cluster:

# Download PCNS from the APC/Schneider Electric website
# Extract and run the installer
tar -xzf PowerChute-Network-Shutdown-*.tar.gz
cd PowerChute-Network-Shutdown/
sudo ./install.sh

# After installation, access the web UI at https://localhost:6547
# Configure: Events → Add ESXi Host → enter IP + credentials
# Threshold: initiate shutdown when battery < 50% or runtime remaining < 5 minutes

PCNS communicates directly with the VMware API, shuts down VMs in the exact Autostart order you configured, then powers off the host. No additional scripting required.

Option 2: NUT + PowerCLI Script (for Non-APC UPS)

2 of my 8 hosts use CyberPower UPS units — no PCNS available. The solution is NUT (Network UPS Tools) combined with a PowerShell script:

# Install NUT on a Linux server connected to the UPS via USB
sudo apt install nut nut-client

# /etc/nut/ups.conf
[cyberpower]
  driver = usbhid-ups
  port = auto
  desc = "CyberPower 1500VA"

# /etc/nut/upsmon.conf — add these lines
MONITOR cyberpower@localhost 1 admin yourpassword master
SHUTDOWNCMD "/usr/local/bin/shutdown-vmware.sh"
MINSUPPLIES 1
#!/bin/bash
# /usr/local/bin/shutdown-vmware.sh
# NUT calls this script when the UPS battery drops to a critical threshold

ESXI_HOST="192.168.1.100"
ESXI_USER="root"
ESXI_PASS="your-password"

echo "$(date '+%Y-%m-%d %H:%M:%S') - UPS battery low, initiating graceful shutdown" \
  >> /var/log/ups-shutdown.log

# Call the PowerShell script to shut down VMs in order
pwsh -NonInteractive -File /usr/local/bin/vmware-shutdown.ps1 \
  -Server "$ESXI_HOST" -User "$ESXI_USER" -Password "$ESXI_PASS"
# /usr/local/bin/vmware-shutdown.ps1
param([string]$Server, [string]$User, [string]$Password)

Connect-VIServer -Server $Server -User $User -Password $Password -Force

# Shut down VMs by name order (or custom sort by tag/notes)
$vms = Get-VM | Where-Object { $_.PowerState -eq "PoweredOn" } | Sort-Object Name -Descending

foreach ($vm in $vms) {
  if ($vm.ExtensionData.Guest.ToolsStatus -eq "toolsOk") {
    Write-Host "Graceful shutdown: $($vm.Name)"
    Shutdown-VMGuest -VM $vm -Confirm:$false
  } else {
    Write-Host "Force stop (no tools): $($vm.Name)"
    Stop-VM -VM $vm -Confirm:$false
  }
}

# Wait for all VMs to shut down, up to 3 minutes
$timeout = 180; $elapsed = 0
while ($elapsed -lt $timeout) {
  $running = (Get-VM | Where-Object { $_.PowerState -eq "PoweredOn" }).Count
  if ($running -eq 0) { break }
  Start-Sleep -Seconds 5; $elapsed += 5
  Write-Host "$running VMs still running... ($elapsed/$timeout sec)"
}

Set-VMHostState -VMHost $Server -State Shutdown -Confirm:$false
Disconnect-VIServer -Confirm:$false

Practical Tips After 6 Months in Production

Test Periodically — Don’t Wait for a Real Outage to Find Out It’s Broken

I schedule tests on the last evening of each month. Use these commands from the ESXi shell to check the current shutdown order and test the logic without actually powering down the host:

# View the current autostart/autostop order configuration
vim-cmd hostsvc/autostartmanager/get_autostartsequence

# Manually trigger autostop for testing (shuts down VMs in configured order)
# Use during a maintenance window, not during peak hours
vim-cmd hostsvc/autostartmanager/autostop

Set Timeouts to Match Your Workload

Windows VMs typically need 60–90 seconds. Linux VMs are faster, around 15–30 seconds. Database VMs under heavy write load may need 3–5 minutes to fully flush transaction logs. I set the default stop delay to 180 seconds, with the production PostgreSQL VM specifically set to 300 seconds in Autostart VM settings.

Periodically Monitor VMware Tools Health

Tools can crash after an OS update or kernel upgrade without anyone noticing — by the time a real power failure hits, it may have been down since yesterday. The following check script runs nightly via cron on the management VM:

#!/bin/bash
# Run from ESXi shell — alerts for VMs with non-functional tools
for vmid in $(vim-cmd vmsvc/getallvms | awk 'NR>1 {print $1}'); do
  status=$(vim-cmd vmsvc/tools.getStatus $vmid 2>/dev/null \
    | grep toolsStatus | awk '{print $3}')
  name=$(vim-cmd vmsvc/getallvms | awk -v id=$vmid '$1==id {print $2}')
  [ "$status" != "toolsOk" ] && echo "WARNING: $name ($vmid): $status"
done

With vSAN or Shared Storage, Order Matters Even More

vSAN requires an extra step: dismounting the datastore before shutting down the host to avoid split-brain. For NFS/iSCSI, unmounting the datastore after all VMs have shut down is sufficient. This is where PCNS handles things better than a manual script — it knows the correct order for each storage type.

After fully deploying this setup, my cluster has survived 3 real power outages without a single corrupted VM. The configuration took 2–3 hours — in exchange for zero data loss and no awkward conversations with clients about why their database is broken.

Share: