24/7 Internet Bandwidth Monitoring: Holding Your ISP Accountable with Prometheus and Grafana

Monitoring tutorial - IT technology blog
Monitoring tutorial - IT technology blog

300Mbps Plan But Zoom Meetings Still Lag?

This scenario is all too familiar for sysadmins: an unstable connection during online meetings or Docker images downloading at a measly few hundred KB/s. You call the hotline, and the technician reports that “the signal is fine.” At this point, a manual Speedtest in a browser isn’t convincing enough since they can blame your Wi-Fi or personal devices.

To stop the guesswork, we need an automated monitoring system running 24/7 to provide historical data for accountability. Instead of clicking “Go” manually, I’ll use Speedtest Exporter to automate testing, push metrics to Prometheus, and visualize them dynamically on Grafana.

Deployment Model: The Three Pillars

Before typing commands, let’s take a quick look at how these components work together:

  • Speedtest Exporter: Acts as the client, periodically running tests (ping, download, upload) and exposing the results as standard Prometheus metrics.
  • Prometheus: The “time-series database” storage. It scrapes metrics from the Exporter and stores them in the database periodically.
  • Grafana: The final visualization layer. It queries data from Prometheus to draw charts, helping you clearly see network lag trends during peak hours or at night.

Tools like Netdata or htop are great for viewing real-time traffic, but they don’t show the maximum bandwidth you actually receive from your ISP. That’s why Speedtest Exporter is essential.

Real-world Deployment with Docker Compose

Using Docker is the cleanest way to avoid Python library conflicts on your OS. Below is a docker-compose.yml file I’ve optimized for homelabs or small office environments:

version: '3.8'

services:
  speedtest-exporter:
    image: miguelndecarvalho/speedtest-exporter
    container_name: speedtest-exporter
    restart: unless-stopped
    ports:
      - "9798:9798"
    environment:
      - SPEEDTEST_INTERVAL=3600 # Run test every 60 minutes

  prometheus:
    image: prom/prometheus
    container_name: prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    restart: unless-stopped

  grafana:
    image: grafana/grafana
    container_name: grafana
    ports:
      - "3000:3000"
    restart: unless-stopped

One parameter to note: I’ve set SPEEDTEST_INTERVAL=3600 (once per hour). Don’t be tempted to set it every 1-5 minutes. Each test can consume up to 500MB of bandwidth if your network is fast. Running it too frequently will slow down your own network, and your ISP might flag the unusual traffic or the Speedtest server could block your IP.

Configuring Prometheus to Scrape Data

Next, create a prometheus.yml file in the same directory to point Prometheus to the Exporter:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'speedtest'
    static_configs:
      - targets: ['speedtest-exporter:9798']

Run docker-compose up -d to start. Access http://localhost:9090, and try typing speedtest_download_bits_per_second to check if data is starting to flow in.

Creating Grafana Charts and Solving ‘Alert Fatigue’

In the Grafana interface, add Prometheus as a Data Source with the URL http://prometheus:9090. The fastest way is to Import Dashboard ID 13680. You will immediately have professional-looking Download, Upload, and Ping charts.

Hard-won Lessons on Alerting

Alert fatigue (notification overload) is the most headache-inducing issue. Initially, I set a Telegram alert for when Download dropped below 50Mbps. The result was my phone blowing up with notifications every time someone at home watched Netflix in 4K or downloaded a game on Steam. Simply put, Speedtest couldn’t grab enough bandwidth at those moments to measure accurately.

Practical Solution:
1. Never alert based on a single result. Use the avg_over_time function in Prometheus to calculate the average over 3-6 hours. If the average for the whole period is low, then it’s truly an ISP issue.
2. Lock in the nearest Server ID (e.g., a Viettel or FPT server in Hanoi/HCMC). This prevents the Exporter from automatically selecting a server as far away as Singapore, which causes high ping and unstable results.

Optimization Tips for More Accurate Measurements

If you run the Exporter on a Raspberry Pi via Wi-Fi, the results only reflect your home Wi-Fi speed. For the most accurate monitoring, remember:

  • Always plug a LAN cable directly from the measuring device into the main Router.
  • Use the miguelndecarvalho image as it supports both ARM (for Pi) and x86 (Server/PC).
  • Ensure the device CPU isn’t overloaded during measurement, as high-speed data processing consumes significant resources.

Conclusion

24/7 Internet monitoring provides you with ironclad evidence when dealing with ISPs and helps detect internal network issues early. I hope these configuration tips and ways to avoid alert fatigue help your system run effectively. If you run into errors while writing Prometheus queries, feel free to leave a comment below!

Share: