Configuring vSphere Alarms and Email Notifications: Automated SysAdmin Alerts When ESXi CPU, RAM, and Datastore Exceed Thresholds – ITFROMZERO

Table of Contents

Three ESXi Monitoring Approaches — and Why I Start with Native Alarms

When I first took over managing my first ESXi cluster, I skipped the Alarms section entirely — figured it was all automated and didn’t need extra setup. That assumption cost me a Monday morning phone call from my manager instead of an email alert: the datastore had been full since the night before, and 3 VMs couldn’t write any data. After that, I never treated the Alarms section as optional again.

There are three common ways to monitor CPU/RAM/Datastore on ESXi and get timely alerts:

Approach 1: vSphere Native Alarms (built-in vCenter)

Built directly into vCenter Server — no additional software required. Define your conditions, set your actions (send email, SNMP trap, run script), done. Setup takes 15 minutes.

Approach 2: Third-party Monitoring Stack (Zabbix / Prometheus)

Zabbix ships with a VMware template out of the box; Prometheus uses vmware_exporter to pull metrics. The dashboards are far more visual and alerting rules are more flexible. The catch: you need a dedicated monitoring server — and you’ll need to maintain it alongside your existing ESXi infrastructure.

Approach 3: PowerCLI Script + Windows Task Scheduler

PowerCLI periodically pulls metrics, compares against thresholds, and sends alerts — email, Slack, Teams, whatever you want. Fully customizable. The cost: you write it from scratch and maintain it every time VMware changes their API.

Real-World Comparison: Which Approach Fits Which Environment?

vSphere Native Alarms: Done in 15 minutes, no additional servers or software needed. Ideal for teams with fewer than 20 hosts that don’t have a dedicated monitoring stack. Downside: plain-text notification emails, no dashboard, no long-term metric storage.
Zabbix / Prometheus: Visual dashboards, 6–12 months of metric retention for trend analysis, highly customizable alerting rules. Worth the investment for large production environments. But budget 2–3 days for initial setup and ongoing maintenance capacity.
PowerCLI Script: Want Slack alerts? Want fully custom report formats? This is your path. Trade-off: you need PowerShell skills and have to update your scripts whenever VMware changes their API.

When I migrated from VMware to Proxmox for my personal lab, I noticed some interesting differences — Proxmox has a built-in alert system that’s fairly basic, nowhere near as granular as vSphere Alarms. The trade-off is that Proxmox integrates much better with Grafana/Prometheus right out of the box. Every platform has its own trade-offs.

Practical advice: For new teams and mid-sized environments, start with vSphere Native Alarms. It works immediately, and you won’t need to justify a dedicated Zabbix server to your manager. Scale up to a specialized monitoring stack later when your environment demands it.

Setting Up vSphere Alarms in Practice — From SMTP Configuration to Your First Alert Email

Step 1: Configure SMTP for vCenter

Alarms can’t send email if vCenter doesn’t know which mail server to use. This is the most commonly skipped step — and the number one reason alarms trigger while your inbox stays silent.

Log into vSphere Client → click vCenter Server in the inventory → Configure tab → Settings → General → Edit button. Scroll down to the Mail section:

SMTP Server: your mail server address (e.g., smtp.gmail.com or your internal Exchange IP)
SMTP Port: 587 for TLS, 465 for SSL, 25 for unencrypted relay
Sender account: the outgoing email address, e.g., [email protected]

Gmail requires an extra step: enable 2FA first, then go to myaccount.google.com → Security → App Passwords to generate a dedicated password for vCenter. Internal Exchange is much simpler — just enter the mail server IP and relay port, no additional authentication needed.

Step 2: Create an Alarm for ESXi Host CPU Usage

In vSphere Client, switch to the Hosts and Clusters tab. Right-click an ESXi host (or click the Datacenter/Cluster to apply the alarm to all hosts within it) → Alarms → New Alarm Definition.

General tab:

Alarm name: ESXi CPU Usage High
Monitor: select Host
Monitor for: select Specific conditions or state

Triggers tab → click Add:

Trigger Type: Host CPU Usage (%)
Operator: Is Above
Warning (Yellow): 80
Critical (Red): 90
Tolerance: 5 (prevents flapping when usage oscillates around the threshold)
Condition length: 5 minutes (only triggers when the condition persists — not a 10-second CPU spike)

Actions tab → click Add:

Action: Send a notification email
To: [email protected] (use a shared mailbox, not a personal email)
Repeat this action: check this if you want a reminder every 30 minutes while the alarm condition persists

Click OK to save.

Step 3: Create an Alarm for RAM Usage

The process mirrors the CPU alarm — just swap the trigger type and thresholds:

Alarm name: ESXi Memory Usage High
Trigger Type: Host Memory Usage (%)
Warning: 85% | Critical: 95%

RAM thresholds are typically set higher than CPU. ESXi uses balloon drivers and memory swapping — a host running at 88% RAM is much less critical than 88% sustained CPU. Just keep an eye on balloon memory in vSphere performance charts: if it’s consistently high, 85% is a reasonable warning threshold.

Step 4: Create an Alarm for Datastore

Key difference: datastore alarms must be created on the Datastore object, not the Host. In vSphere Client, switch to the Storage tab → right-click the Datastore → Alarms → New Alarm Definition:

Alarm name: Datastore Disk Usage High
Monitor: Datastore
Trigger Type: Datastore Disk Usage (%)
Warning: 75% | Critical: 85%

A full datastore is the worst case with no self-healing mechanism: VMs actively writing data crash immediately with no automatic recovery. That’s why thresholds are set lower than CPU/RAM. Early warning at 75% gives you enough time to clean up or expand storage before hitting the critical 85% mark.

Step 5: Test Before Going Live

Don’t wait for a real incident to find out if your alarms work. The simplest test: temporarily lower the threshold to trigger the alarm immediately.

Open the alarm you just created → Edit → set Warning threshold = 10% → wait 5–10 minutes → check your email. The subject should look like: [vCenter Alarm] ESXi CPU Usage High on esxi01 - Warning. If it arrives, both SMTP and the alarm are working correctly. Then restore the real threshold.

PowerCLI is also useful for pulling a list of active alarms — handy when managing multiple hosts:

# Connect to vCenter
Connect-VIServer -Server vcenter.company.com -User [email protected] -Password "YourPassword"

# List all enabled alarm definitions
Get-AlarmDefinition | Where-Object { $_.Enabled -eq $true } | Select-Object Name, Entity, Enabled | Format-Table -AutoSize

# List alarms currently in Warning or Critical state
Get-VMHost | Get-AlarmAction | Where-Object { $_.Alarm.AlarmState -ne "Green" }

# List triggered alarms on a specific host
$vmhost = Get-VMHost "esxi01.company.com"
$alarmMgr = Get-View AlarmManager
$alarmMgr.GetAlarmState($vmhost.ExtensionData.MoRef) | Where-Object { $_.OverallStatus -ne "green" }

Step 6: Fine-Tune to Avoid Alert Fatigue

I once received 200+ emails in a single day from a CPU alarm set at 70% on a production server that routinely ran at 75–80%. My inbox flooded with alerts and I started marking them read without looking. That’s the worst failure mode — not missing monitoring, but having so many alerts that nobody pays attention anymore.

After 1–2 weeks of real-world data, revisit your configuration:

Adjust thresholds based on your baseline — if the server typically runs at 75% CPU, set warning = 90% rather than 80%
Route alerts to a shared mailbox or Slack/Teams channel, not directly to personal email
Configure a “green” action to receive an email when the issue resolves — important for knowing when an alarm has cleared
Managing more than 5 hosts: consider creating alarms at the Cluster or Datacenter level — configure once, applies to all hosts within

Pre-Go-Live Checklist

SMTP configured, at least 1 test email received
CPU alarm: Warning 80%, Critical 90%, Tolerance 5%, Condition length 5 minutes
RAM alarm: Warning 85%, Critical 95%
Datastore alarm: Warning 75%, Critical 85% (create separately for each datastore)
Alert recipients are a shared mailbox or team channel
“Green” action configured to notify when the issue resolves
Thresholds reviewed against real-world baseline after 1 week of observation