Mastering VMware vSphere Logs: Troubleshooting Systems with vmware.log, hostd.log, and vpxd.log – ITFROMZERO

Table of Contents

When vCenter Turns Red: Don’t Guess, Check the Logs

The most “heart-stopping” sight for any system administrator is opening their laptop on a Monday morning to find vCenter glowing red. A series of virtual machines (VMs) are disconnected, or worse, an entire ESXi Host in a cluster shows a “Not Responding” status. I’m currently running an 8-node ESXi cluster with nearly 150 VMs, and honestly, the Web Client (GUI) usually just gives vague messages like “An error occurred.”

This is exactly when you need to SSH into the system to check the log files—where every tiny process is recorded in detail. Logs aren’t just meaningless strings of text. They are living evidence that helps you pinpoint exactly why a VM won’t start or why vCenter suddenly refuses to connect to a new host.

Three “Critical” Log Files You Must Know by Heart

Although vSphere generates dozens of log files, based on my hands-on experience, 90% of common errors can be found in these three names:

vmware.log: The individual log for each VM, located right in the directory containing the .vmx file on the Datastore. If a virtual machine freezes or shuts down unexpectedly, this is the first place you should look.
hostd.log: Located at /var/log/hostd.log on ESXi. This file manages host activities and commands sent from vCenter. If the host has hardware resource issues, check here.
vpxd.log: Located on the vCenter Server Appliance (VCSA). It records all communication between vCenter and ESXi hosts.

Log Access and Filtering Skills Like a Pro

Never use vi or nano to open a 2GB log file. It will freeze your session immediately. Instead, use the “holy trinity” of commands: cat, grep, and tail.

1. Enable SSH on ESXi

For security reasons, SSH is usually disabled by default. Go to the Services section on ESXi or vCenter to start this service before using Termius or PuTTY to connect.

2. Monitor Logs in Real-Time with tail -f

Suppose you just clicked Power On for a VM and it immediately reports an error. Run this command to see the latest log entries as they appear:

tail -f /var/log/hostd.log

3. Filter Keywords with grep

To quickly find storage-related errors among thousands of lines of text, combine it with less for easy scrolling:

grep -i "error" /var/log/hostd.log | less

Real-World Troubleshooting Scenarios

Case 1: VM Shuts Down Unexpectedly (Inspecting vmware.log)

Navigate directly to the VM’s folder on the Datastore: /vmfs/volumes/STORAGE_NAME/VM_NAME/. Once, while troubleshooting a VM that kept crashing, I found this line:

[msg.monitorEvent.halt] The CPU has been disabled by the guest operating system.

Seeing this, it’s clear: the error isn’t from VMware but from the Guest OS experiencing a Kernel Panic or Blue Screen of Death (BSOD). This saves time investigating the virtualization layer.

Case 2: ESXi Host Disconnected (Inspecting hostd.log)

When a host reports “Not Responding,” I usually check if the management service is running out of memory. A classic error is:

VmkVigor: Resource mapping failed: Out of memory

The quickest solution is to restart the management agents to free up RAM:

/etc/init.d/hostd restart
/etc/init.d/vpxa restart

Case 3: Errors During VM Migration or Cloning (Inspecting vpxd.log)

This file is located at /var/log/vmware/vpxd/ on vCenter. If vMotion fails, vpxd.log will point out the bottleneck. For example, if vCenter loses connection to the SQL Database, you’ll see:

[VpxdDb] Failed to connect to database: [P0001] ERROR: database is starting up

Small but Mighty Tips for Reading Logs

After years on the front lines, these tips have saved me hours of work:

Pay Attention to the Timestamp: VMware logs use UTC. If you match them with local events, remember to adjust for your time zone (e.g., UTC+7 for Vietnam).
Read Compressed Files with zgrep: ESXi compresses old logs into .gz files. You don’t need to extract them; just use zgrep to search directly:
```
zgrep -i "failed" /var/log/hostd.0.log.gz
```
Leverage Error Codes: If you see an error code like 0x..., don’t guess. Copy and paste it into the VMware Knowledge Base (KB). 99% of the time, someone else has had the same issue, and a guide is already available.

Conclusion

At first, looking at a black terminal screen with thousands of lines of text scrolling by can be daunting. But once you become familiar with the heartbeat of vmware.log, hostd.log, or vpxd.log, you’ll feel much more confident. Instead of guessing, you’ll make decisions based on solid technical evidence.

Try SSHing into your host today to see what your system is “saying.” Good luck with your troubleshooting!