How do you know when your server is having issues?
Back in the day, I’d find out the server was slow the hard way — users would message me to complain first. Then I’d SSH in and run top, df -h, free -m one by one. Three or four servers was manageable — but once you hit ten-plus, you don’t even know where to start.
After setting up Prometheus + Grafana, everything changed. Open the dashboard and it’s all right there: when the CPU spiked, current RAM usage, how much disk space is left — all on a single screen. No more SSHing into individual servers.
This article goes straight to a practical installation on Ubuntu 22.04. Node Exporter for collecting metrics, Grafana for visualization.
How it works — the 30-second overview
Prometheus is a time-series database and scraper rolled into one. Every 15 seconds, it makes HTTP calls to exporters, pulls in metrics, and stores them in its own storage backend. You query with PromQL — Prometheus’s own query language, which you can pick up and use in a few hours.
Node Exporter runs on each server you want to monitor and exposes a /metrics endpoint with 700+ metrics: per-core CPU, RAM, disk I/O, network traffic, open file descriptors, and more.
Grafana acts as the frontend. It connects to Prometheus, queries the data, and renders it into dashboards. Alerts can be sent via email, Slack, or a Telegram webhook.
The data flow:
Node Exporter (port 9100) ← Prometheus scrapes every 15s → Stored in TSDB → Grafana queries → Dashboard
Installing Node Exporter on the servers you want to monitor
Create a dedicated system user — running the exporter as root is bad practice:
# Create system user
sudo useradd --no-create-home --shell /bin/false node_exporter
# Download Node Exporter (check the latest release at github.com/prometheus/node_exporter)
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar xvf node_exporter-1.8.2.linux-amd64.tar.gz
sudo cp node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
Next, create a systemd service so Node Exporter starts automatically on reboot:
sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
# Verify it's running
curl http://localhost:9100/metrics | head -20
If the output contains lines like node_cpu_seconds_total{...}, Node Exporter is running correctly.
Installing Prometheus on the monitoring server
I keep things separate: one dedicated server runs Prometheus + Grafana, while all other servers only have Node Exporter installed. It makes backups easier and keeps monitoring isolated from production workloads.
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar xvf prometheus-2.52.0.linux-amd64.tar.gz
sudo cp prometheus-2.52.0.linux-amd64/{prometheus,promtool} /usr/local/bin/
sudo cp -r prometheus-2.52.0.linux-amd64/{consoles,console_libraries} /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus
Configuring scrape targets
This is the most important part — declaring which servers Prometheus should collect metrics from:
sudo nano /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets:
- '192.168.1.10:9100' # web server
- '192.168.1.11:9100' # db server
- '192.168.1.12:9100' # app server
Create a systemd service for Prometheus:
sudo nano /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--storage.tsdb.retention.time=30d \
--web.listen-address=0.0.0.0:9090
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
# Check whether targets are up
curl http://localhost:9090/api/v1/targets | python3 -m json.tool | grep health
Navigate to http://<monitoring-server>:9090/targets to check the status — targets shown in green are being scraped successfully. Quick note: 30-day retention across 3 servers consumes roughly 2–4 GB of disk; adjust --storage.tsdb.retention.time if disk space is tight.
Installing Grafana
Grafana has an official APT repository, which is cleaner than downloading files manually:
sudo apt-get install -y apt-transport-https software-properties-common wget
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install grafana
sudo systemctl enable --now grafana-server
The default port is 3000. Go to http://<server-ip>:3000, log in with admin/admin, and change your password immediately on first login.
Connecting Grafana to Prometheus
- Go to Connections → Data Sources → Add data source
- Select Prometheus
- URL:
http://localhost:9090(if on the same server) or the IP address of your Prometheus server - Click Save & Test — you’re done when you see “Successfully queried the Prometheus API”
Importing a dashboard in 2 minutes
No need to build from scratch. Grafana has a community dashboard library at grafana.com/grafana/dashboards — ID 1860 (Node Exporter Full) is the most downloaded dashboard with millions of installs, covering nearly everything you need:
- Dashboards → Import
- Enter ID
1860and click Load - Select the Prometheus data source you just created
- Import — CPU, RAM, disk, and network panels appear immediately, including per-core and per-disk breakdowns
Commonly used PromQL queries
When you need to build custom panels, these are the queries I use almost every day:
# CPU usage % (5-minute average)
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# RAM usage %
100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100)
# Disk usage by mount point
100 - ((node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes) * 100)
# Inbound network traffic (bytes/s)
rate(node_network_receive_bytes_total{device!="lo"}[5m])
Setting up an alert when CPU exceeds a threshold
Grafana can create alerts directly from a panel — no Alertmanager needed for simple cases. Example: alert when CPU stays above 80% for 5 consecutive minutes:
- Open the CPU Usage panel → Edit
- Go to the Alert tab → New alert rule
- Condition:
WHEN avg() OF query IS ABOVE 80 - For:
5m— wait 5 minutes before triggering, to avoid false positives from brief CPU spikes - Notification: select a contact point (email, Slack, Telegram webhook)
Wrapping up
This is the exact setup I’m running in production. From start to a fully functional dashboard takes about 30–45 minutes if you’re comfortable with Linux.
The best thing about Prometheus: data is stored as time-series, so when an incident happens, I can rewind and see exactly what RAM looked like at 2:37 AM last night, or pinpoint when the CPU started climbing. No guesswork, no blind spots.
Want to take it further? Alertmanager handles more complex alerting scenarios — grouping, silencing, and routing by team. If you’re running Docker, add cAdvisor to track per-container resource usage.

