Prometheus Blackbox Exporter: Monitor HTTP, DNS, TCP and SSL Certificate Expiry – ITFROMZERO

Table of Contents

Why I Needed Blackbox Exporter

My monitoring stack — Prometheus + Grafana watching 15 servers — has caught incidents before users reported them more times than I can count. But there was a dangerous blind spot: I knew CPU/RAM/disk were fine, yet had no idea whether the website was actually responding, whether DNS was resolving correctly, or how many days were left on an SSL certificate before it blew up.

Blackbox Exporter fills exactly that gap. Instead of installing an agent on each server (white-box monitoring), it probes from the outside — exactly the way a real user types a URL into a browser. Blackbox probes endpoints over HTTP, HTTPS, DNS, TCP, and ICMP, then pushes the results back for Prometheus to scrape.

A real situation I ran into: an SSL certificate on a subdomain expired without anyone noticing, and by the time browsers started throwing errors it was already too late. After setting up Blackbox with a “14 days remaining” alert, I got a Telegram notification early enough to renew it in time — no more midnight firefighting.

Installing Blackbox Exporter

I install it directly on the server already running Prometheus — no Docker needed for this straightforward setup.

Download and Install the Binary

# Download the latest release (check github.com/prometheus/blackbox_exporter)
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz

tar xvf blackbox_exporter-0.25.0.linux-amd64.tar.gz
cd blackbox_exporter-0.25.0.linux-amd64

# Copy binary and config
sudo cp blackbox_exporter /usr/local/bin/
sudo mkdir -p /etc/blackbox_exporter
sudo cp blackbox.yml /etc/blackbox_exporter/

Create a systemd Service

sudo tee /etc/systemd/system/blackbox_exporter.service <<EOF
[Unit]
Description=Prometheus Blackbox Exporter
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/blackbox_exporter \
  --config.file=/etc/blackbox_exporter/blackbox.yml \
  --web.listen-address=:9115
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now blackbox_exporter
sudo systemctl status blackbox_exporter

Once done, open http://<server-ip>:9115 to confirm the service is running. You’ll see a simple web interface listing all configured modules.

Configuring Probe Modules

The file /etc/blackbox_exporter/blackbox.yml defines “modules” — each module is a different type of check. Here’s the config I’m actually running in production:

modules:
  # HTTP 2xx check — used for regular websites
  http_2xx:
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200, 201, 301, 302]
      method: GET
      follow_redirects: true
      preferred_ip_protocol: "ip4"

  # HTTPS check + SSL certificate validation
  http_2xx_tls:
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200]
      method: GET
      follow_redirects: true
      tls_config:
        insecure_skip_verify: false   # Enforce SSL verification

  # TCP port open check (MySQL, Redis, custom services)
  tcp_connect:
    prober: tcp
    timeout: 5s

  # DNS resolution check
  dns_check:
    prober: dns
    timeout: 5s
    dns:
      query_name: "google.com"      # Domain to query
      query_type: "A"               # A, AAAA, MX, CNAME...
      valid_rcodes:
        - NOERROR

Configuring Prometheus Scraping

This is where Blackbox Exporter differs from typical exporters. Rather than scraping directly, Prometheus uses a relabeling mechanism to pass the target URL at scrape time — meaning a single Blackbox instance can probe dozens of different URLs without running multiple processes:

# Add to prometheus.yml
scrape_configs:

  # HTTP/HTTPS website checks
  - job_name: 'blackbox_http'
    metrics_path: /probe
    params:
      module: [http_2xx_tls]
    static_configs:
      - targets:
          - https://itfromzero.com
          - https://example.com
          - https://api.yourdomain.com/health
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115   # Blackbox Exporter address

  # TCP port checks
  - job_name: 'blackbox_tcp'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
          - db-server:3306     # MySQL
          - cache-server:6379  # Redis
          - app-server:8080    # App port
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  # DNS checks
  - job_name: 'blackbox_dns'
    metrics_path: /probe
    params:
      module: [dns_check]
    static_configs:
      - targets:
          - 8.8.8.8    # DNS server to check
          - 1.1.1.1
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Reload Prometheus after editing the config:

curl -X POST http://localhost:9090/-/reload

Verifying Results and Setting Up Alerts

Quick Test with curl Before Waiting for Prometheus

No need to wait for the next Prometheus scrape cycle — call the Blackbox endpoint directly to check immediately:

# Check HTTP
curl -s "http://localhost:9115/probe?target=https://itfromzero.com&module=http_2xx_tls" | grep probe_success
# Expected result: probe_success 1

# How many seconds until SSL certificate expires
curl -s "http://localhost:9115/probe?target=https://itfromzero.com&module=http_2xx_tls" | grep ssl_earliest_cert_expiry
# probe_ssl_earliest_cert_expiry 1.7XXXXXXXXX+09  ← Unix timestamp when cert expires

# Check TCP
curl -s "http://localhost:9115/probe?target=db-server:3306&module=tcp_connect" | grep probe_success

Alerting Rules for Prometheus

Alert rules are what transform Blackbox from a “nice to look at” tool into a real monitoring system. I keep them in a separate file for easier management:

# /etc/prometheus/rules/blackbox.yml
groups:
  - name: blackbox_alerts
    rules:

      # Website down
      - alert: WebsiteDown
        expr: probe_success{job="blackbox_http"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Website DOWN: {{ $labels.instance }}"
          description: "{{ $labels.instance }} has not responded for more than 2 minutes."

      # Slow HTTP response time
      - alert: SlowResponseTime
        expr: probe_duration_seconds{job="blackbox_http"} > 3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Slow response: {{ $labels.instance }}"
          description: "Response time {{ $value | humanizeDuration }} exceeds 3 seconds."

      # SSL Certificate expiring soon (14 days)
      - alert: SSLCertExpiringSoon
        expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 14
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "SSL expiring soon: {{ $labels.instance }}"
          description: "Certificate expires in {{ $value | humanize }} days. Renew now!"

      # SSL Certificate already expired
      - alert: SSLCertExpired
        expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "SSL EXPIRED: {{ $labels.instance }}"

      # TCP port closed
      - alert: TCPPortDown
        expr: probe_success{job="blackbox_tcp"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Port DOWN: {{ $labels.instance }}"
          description: "Cannot connect to {{ $labels.instance }}"

Add the rule file to prometheus.yml:

rule_files:
  - "/etc/prometheus/rules/*.yml"

Grafana Dashboard

Import dashboard ID 7587 from Grafana.com — no need to build from scratch. It displays everything you need: probe status, response time, SSL days remaining, and HTTP status codes. I use it as my main overview screen.

A few PromQL queries for building custom panels if you need more control:

# Days remaining on SSL certificate
(probe_ssl_earliest_cert_expiry{job="blackbox_http"} - time()) / 86400

# Uptime % over the last 24h
avg_over_time(probe_success{job="blackbox_http"}[24h]) * 100

# HTTP status code
probe_http_status_code{job="blackbox_http"}

Practical Lessons Learned

Firewall: The server running Blackbox must have outbound internet access (ports 80, 443, 53). If it sits in a private network without NAT or a proxy, HTTP probes to external targets will fail — and you’ll mistakenly think the website is down.
DNS module: The default config uses google.com as the query_name. If you want to check an internal DNS server, change this to an internal domain — querying google.com through your internal DNS doesn’t tell you much.
Timeout vs. scrape interval: Timeouts must be shorter than the scrape interval. Prometheus defaults to scraping every 30s — if your HTTP timeout is 30s or more, probes will overlap and metrics will be inaccurate. I use 10s for HTTP and 5s for TCP/DNS, which works well.
TLS insecure: Never set insecure_skip_verify: true in production. The whole point of Blackbox is to catch SSL issues — skipping verification defeats the purpose entirely.
Alert routing: I route severity: critical to Telegram via Alertmanager and warning to email. A critical alert at 2 AM that only goes to email means you won’t find out until morning — way too late to do anything about it.