Three ESXi Monitoring Approaches — and Why I Start with Native Alarms
When I first took over managing my first ESXi cluster, I skipped the Alarms section entirely — figured it was all automated and didn’t need extra setup. That assumption cost me a Monday morning phone call from my manager instead of an email alert: the datastore had been full since the night before, and 3 VMs couldn’t write any data. After that, I never treated the Alarms section as optional again.
There are three common ways to monitor CPU/RAM/Datastore on ESXi and get timely alerts:
Approach 1: vSphere Native Alarms (built-in vCenter)
Built directly into vCenter Server — no additional software required. Define your conditions, set your actions (send email, SNMP trap, run script), done. Setup takes 15 minutes.
Approach 2: Third-party Monitoring Stack (Zabbix / Prometheus)
Zabbix ships with a VMware template out of the box; Prometheus uses vmware_exporter to pull metrics. The dashboards are far more visual and alerting rules are more flexible. The catch: you need a dedicated monitoring server — and you’ll need to maintain it alongside your existing ESXi infrastructure.
Approach 3: PowerCLI Script + Windows Task Scheduler
PowerCLI periodically pulls metrics, compares against thresholds, and sends alerts — email, Slack, Teams, whatever you want. Fully customizable. The cost: you write it from scratch and maintain it every time VMware changes their API.
Real-World Comparison: Which Approach Fits Which Environment?
- vSphere Native Alarms: Done in 15 minutes, no additional servers or software needed. Ideal for teams with fewer than 20 hosts that don’t have a dedicated monitoring stack. Downside: plain-text notification emails, no dashboard, no long-term metric storage.
- Zabbix / Prometheus: Visual dashboards, 6–12 months of metric retention for trend analysis, highly customizable alerting rules. Worth the investment for large production environments. But budget 2–3 days for initial setup and ongoing maintenance capacity.
- PowerCLI Script: Want Slack alerts? Want fully custom report formats? This is your path. Trade-off: you need PowerShell skills and have to update your scripts whenever VMware changes their API.
When I migrated from VMware to Proxmox for my personal lab, I noticed some interesting differences — Proxmox has a built-in alert system that’s fairly basic, nowhere near as granular as vSphere Alarms. The trade-off is that Proxmox integrates much better with Grafana/Prometheus right out of the box. Every platform has its own trade-offs.
Practical advice: For new teams and mid-sized environments, start with vSphere Native Alarms. It works immediately, and you won’t need to justify a dedicated Zabbix server to your manager. Scale up to a specialized monitoring stack later when your environment demands it.
Setting Up vSphere Alarms in Practice — From SMTP Configuration to Your First Alert Email
Step 1: Configure SMTP for vCenter
Alarms can’t send email if vCenter doesn’t know which mail server to use. This is the most commonly skipped step — and the number one reason alarms trigger while your inbox stays silent.
Log into vSphere Client → click vCenter Server in the inventory → Configure tab → Settings → General → Edit button. Scroll down to the Mail section:
- SMTP Server: your mail server address (e.g.,
smtp.gmail.comor your internal Exchange IP) - SMTP Port:
587for TLS,465for SSL,25for unencrypted relay - Sender account: the outgoing email address, e.g.,
[email protected]
Gmail requires an extra step: enable 2FA first, then go to myaccount.google.com → Security → App Passwords to generate a dedicated password for vCenter. Internal Exchange is much simpler — just enter the mail server IP and relay port, no additional authentication needed.
Step 2: Create an Alarm for ESXi Host CPU Usage
In vSphere Client, switch to the Hosts and Clusters tab. Right-click an ESXi host (or click the Datacenter/Cluster to apply the alarm to all hosts within it) → Alarms → New Alarm Definition.
General tab:
- Alarm name:
ESXi CPU Usage High - Monitor: select
Host - Monitor for: select
Specific conditions or state
Triggers tab → click Add:
- Trigger Type:
Host CPU Usage (%) - Operator:
Is Above - Warning (Yellow):
80 - Critical (Red):
90 - Tolerance:
5(prevents flapping when usage oscillates around the threshold) - Condition length:
5 minutes(only triggers when the condition persists — not a 10-second CPU spike)
Actions tab → click Add:
- Action:
Send a notification email - To:
[email protected](use a shared mailbox, not a personal email) - Repeat this action: check this if you want a reminder every 30 minutes while the alarm condition persists
Click OK to save.
Step 3: Create an Alarm for RAM Usage
The process mirrors the CPU alarm — just swap the trigger type and thresholds:
- Alarm name:
ESXi Memory Usage High - Trigger Type:
Host Memory Usage (%) - Warning:
85% | Critical:95%
RAM thresholds are typically set higher than CPU. ESXi uses balloon drivers and memory swapping — a host running at 88% RAM is much less critical than 88% sustained CPU. Just keep an eye on balloon memory in vSphere performance charts: if it’s consistently high, 85% is a reasonable warning threshold.
Step 4: Create an Alarm for Datastore
Key difference: datastore alarms must be created on the Datastore object, not the Host. In vSphere Client, switch to the Storage tab → right-click the Datastore → Alarms → New Alarm Definition:
- Alarm name:
Datastore Disk Usage High - Monitor:
Datastore - Trigger Type:
Datastore Disk Usage (%) - Warning:
75% | Critical:85%
A full datastore is the worst case with no self-healing mechanism: VMs actively writing data crash immediately with no automatic recovery. That’s why thresholds are set lower than CPU/RAM. Early warning at 75% gives you enough time to clean up or expand storage before hitting the critical 85% mark.
Step 5: Test Before Going Live
Don’t wait for a real incident to find out if your alarms work. The simplest test: temporarily lower the threshold to trigger the alarm immediately.
Open the alarm you just created → Edit → set Warning threshold = 10% → wait 5–10 minutes → check your email. The subject should look like: [vCenter Alarm] ESXi CPU Usage High on esxi01 - Warning. If it arrives, both SMTP and the alarm are working correctly. Then restore the real threshold.
PowerCLI is also useful for pulling a list of active alarms — handy when managing multiple hosts:
# Connect to vCenter
Connect-VIServer -Server vcenter.company.com -User [email protected] -Password "YourPassword"
# List all enabled alarm definitions
Get-AlarmDefinition | Where-Object { $_.Enabled -eq $true } | Select-Object Name, Entity, Enabled | Format-Table -AutoSize
# List alarms currently in Warning or Critical state
Get-VMHost | Get-AlarmAction | Where-Object { $_.Alarm.AlarmState -ne "Green" }
# List triggered alarms on a specific host
$vmhost = Get-VMHost "esxi01.company.com"
$alarmMgr = Get-View AlarmManager
$alarmMgr.GetAlarmState($vmhost.ExtensionData.MoRef) | Where-Object { $_.OverallStatus -ne "green" }
Step 6: Fine-Tune to Avoid Alert Fatigue
I once received 200+ emails in a single day from a CPU alarm set at 70% on a production server that routinely ran at 75–80%. My inbox flooded with alerts and I started marking them read without looking. That’s the worst failure mode — not missing monitoring, but having so many alerts that nobody pays attention anymore.
After 1–2 weeks of real-world data, revisit your configuration:
- Adjust thresholds based on your baseline — if the server typically runs at 75% CPU, set warning = 90% rather than 80%
- Route alerts to a shared mailbox or Slack/Teams channel, not directly to personal email
- Configure a “green” action to receive an email when the issue resolves — important for knowing when an alarm has cleared
- Managing more than 5 hosts: consider creating alarms at the Cluster or Datacenter level — configure once, applies to all hosts within
Pre-Go-Live Checklist
- SMTP configured, at least 1 test email received
- CPU alarm: Warning 80%, Critical 90%, Tolerance 5%, Condition length 5 minutes
- RAM alarm: Warning 85%, Critical 95%
- Datastore alarm: Warning 75%, Critical 85% (create separately for each datastore)
- Alert recipients are a shared mailbox or team channel
- “Green” action configured to notify when the issue resolves
- Thresholds reviewed against real-world baseline after 1 week of observation

