Why I Needed Blackbox Exporter
My monitoring stack — Prometheus + Grafana watching 15 servers — has caught incidents before users reported them more times than I can count. But there was a dangerous blind spot: I knew CPU/RAM/disk were fine, yet had no idea whether the website was actually responding, whether DNS was resolving correctly, or how many days were left on an SSL certificate before it blew up.
Blackbox Exporter fills exactly that gap. Instead of installing an agent on each server (white-box monitoring), it probes from the outside — exactly the way a real user types a URL into a browser. Blackbox probes endpoints over HTTP, HTTPS, DNS, TCP, and ICMP, then pushes the results back for Prometheus to scrape.
A real situation I ran into: an SSL certificate on a subdomain expired without anyone noticing, and by the time browsers started throwing errors it was already too late. After setting up Blackbox with a “14 days remaining” alert, I got a Telegram notification early enough to renew it in time — no more midnight firefighting.
Installing Blackbox Exporter
I install it directly on the server already running Prometheus — no Docker needed for this straightforward setup.
Download and Install the Binary
# Download the latest release (check github.com/prometheus/blackbox_exporter)
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz
tar xvf blackbox_exporter-0.25.0.linux-amd64.tar.gz
cd blackbox_exporter-0.25.0.linux-amd64
# Copy binary and config
sudo cp blackbox_exporter /usr/local/bin/
sudo mkdir -p /etc/blackbox_exporter
sudo cp blackbox.yml /etc/blackbox_exporter/
Create a systemd Service
sudo tee /etc/systemd/system/blackbox_exporter.service <<EOF
[Unit]
Description=Prometheus Blackbox Exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter \
--config.file=/etc/blackbox_exporter/blackbox.yml \
--web.listen-address=:9115
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now blackbox_exporter
sudo systemctl status blackbox_exporter
Once done, open http://<server-ip>:9115 to confirm the service is running. You’ll see a simple web interface listing all configured modules.
Configuring Probe Modules
The file /etc/blackbox_exporter/blackbox.yml defines “modules” — each module is a different type of check. Here’s the config I’m actually running in production:
modules:
# HTTP 2xx check — used for regular websites
http_2xx:
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: [200, 201, 301, 302]
method: GET
follow_redirects: true
preferred_ip_protocol: "ip4"
# HTTPS check + SSL certificate validation
http_2xx_tls:
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: [200]
method: GET
follow_redirects: true
tls_config:
insecure_skip_verify: false # Enforce SSL verification
# TCP port open check (MySQL, Redis, custom services)
tcp_connect:
prober: tcp
timeout: 5s
# DNS resolution check
dns_check:
prober: dns
timeout: 5s
dns:
query_name: "google.com" # Domain to query
query_type: "A" # A, AAAA, MX, CNAME...
valid_rcodes:
- NOERROR
Configuring Prometheus Scraping
This is where Blackbox Exporter differs from typical exporters. Rather than scraping directly, Prometheus uses a relabeling mechanism to pass the target URL at scrape time — meaning a single Blackbox instance can probe dozens of different URLs without running multiple processes:
# Add to prometheus.yml
scrape_configs:
# HTTP/HTTPS website checks
- job_name: 'blackbox_http'
metrics_path: /probe
params:
module: [http_2xx_tls]
static_configs:
- targets:
- https://itfromzero.com
- https://example.com
- https://api.yourdomain.com/health
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115 # Blackbox Exporter address
# TCP port checks
- job_name: 'blackbox_tcp'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- db-server:3306 # MySQL
- cache-server:6379 # Redis
- app-server:8080 # App port
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
# DNS checks
- job_name: 'blackbox_dns'
metrics_path: /probe
params:
module: [dns_check]
static_configs:
- targets:
- 8.8.8.8 # DNS server to check
- 1.1.1.1
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
Reload Prometheus after editing the config:
curl -X POST http://localhost:9090/-/reload
Verifying Results and Setting Up Alerts
Quick Test with curl Before Waiting for Prometheus
No need to wait for the next Prometheus scrape cycle — call the Blackbox endpoint directly to check immediately:
# Check HTTP
curl -s "http://localhost:9115/probe?target=https://itfromzero.com&module=http_2xx_tls" | grep probe_success
# Expected result: probe_success 1
# How many seconds until SSL certificate expires
curl -s "http://localhost:9115/probe?target=https://itfromzero.com&module=http_2xx_tls" | grep ssl_earliest_cert_expiry
# probe_ssl_earliest_cert_expiry 1.7XXXXXXXXX+09 ← Unix timestamp when cert expires
# Check TCP
curl -s "http://localhost:9115/probe?target=db-server:3306&module=tcp_connect" | grep probe_success
Alerting Rules for Prometheus
Alert rules are what transform Blackbox from a “nice to look at” tool into a real monitoring system. I keep them in a separate file for easier management:
# /etc/prometheus/rules/blackbox.yml
groups:
- name: blackbox_alerts
rules:
# Website down
- alert: WebsiteDown
expr: probe_success{job="blackbox_http"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Website DOWN: {{ $labels.instance }}"
description: "{{ $labels.instance }} has not responded for more than 2 minutes."
# Slow HTTP response time
- alert: SlowResponseTime
expr: probe_duration_seconds{job="blackbox_http"} > 3
for: 5m
labels:
severity: warning
annotations:
summary: "Slow response: {{ $labels.instance }}"
description: "Response time {{ $value | humanizeDuration }} exceeds 3 seconds."
# SSL Certificate expiring soon (14 days)
- alert: SSLCertExpiringSoon
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 14
for: 1h
labels:
severity: warning
annotations:
summary: "SSL expiring soon: {{ $labels.instance }}"
description: "Certificate expires in {{ $value | humanize }} days. Renew now!"
# SSL Certificate already expired
- alert: SSLCertExpired
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 0
for: 0m
labels:
severity: critical
annotations:
summary: "SSL EXPIRED: {{ $labels.instance }}"
# TCP port closed
- alert: TCPPortDown
expr: probe_success{job="blackbox_tcp"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Port DOWN: {{ $labels.instance }}"
description: "Cannot connect to {{ $labels.instance }}"
Add the rule file to prometheus.yml:
rule_files:
- "/etc/prometheus/rules/*.yml"
Grafana Dashboard
Import dashboard ID 7587 from Grafana.com — no need to build from scratch. It displays everything you need: probe status, response time, SSL days remaining, and HTTP status codes. I use it as my main overview screen.
A few PromQL queries for building custom panels if you need more control:
# Days remaining on SSL certificate
(probe_ssl_earliest_cert_expiry{job="blackbox_http"} - time()) / 86400
# Uptime % over the last 24h
avg_over_time(probe_success{job="blackbox_http"}[24h]) * 100
# HTTP status code
probe_http_status_code{job="blackbox_http"}
Practical Lessons Learned
- Firewall: The server running Blackbox must have outbound internet access (ports 80, 443, 53). If it sits in a private network without NAT or a proxy, HTTP probes to external targets will fail — and you’ll mistakenly think the website is down.
- DNS module: The default config uses
google.comas thequery_name. If you want to check an internal DNS server, change this to an internal domain — querying google.com through your internal DNS doesn’t tell you much. - Timeout vs. scrape interval: Timeouts must be shorter than the scrape interval. Prometheus defaults to scraping every 30s — if your HTTP timeout is 30s or more, probes will overlap and metrics will be inaccurate. I use 10s for HTTP and 5s for TCP/DNS, which works well.
- TLS insecure: Never set
insecure_skip_verify: truein production. The whole point of Blackbox is to catch SSL issues — skipping verification defeats the purpose entirely. - Alert routing: I route
severity: criticalto Telegram via Alertmanager andwarningto email. A critical alert at 2 AM that only goes to email means you won’t find out until morning — way too late to do anything about it.
