VictoriaMetrics Installation Guide: A Prometheus Alternative Saving 90% Disk Space

Monitoring tutorial - IT technology blog
Monitoring tutorial - IT technology blog

When Prometheus Becomes a Resource “Nightmare”

If you’ve ever stayed up all night because Prometheus hit an OOM (Out of Memory) error during peak hours, you know that feeling of helplessness. Prometheus is the gold standard in monitoring, but as the scale grows, it reveals some critical limitations.

In a previous project, I managed a monitoring cluster for 200 microservices. Everything was fine until the number of active time series exceeded 1 million. Grafana started lagging, and dashboards took more than 15 seconds to load. It peaked when Prometheus was constantly OOM killed because the RAM couldn’t handle the massive amount of data indexing.

Storage (retention) was equally problematic. To store data for a year, the hard drive required several Terabytes. Backing up Prometheus data felt like a gamble due to its complex structure. A single disk failure once cost me 3 months of data—a truly expensive lesson.

Why is Prometheus So Resource-Intensive?

To optimize, we need to understand the root causes of Prometheus TSDB’s issues:

  • High Cardinality: This is the silent killer. When you attach labels with constantly changing values like user_id, the number of time series explodes. Prometheus is forced to keep the index of all these series in RAM to ensure query speed.
  • Data Writing Structure: Merging and compressing old data blocks consumes a significant amount of CPU and I/O. When a system has thousands of targets, disk I/O will constantly be in the red zone.
  • Horizontal Scaling Difficulties: The original Prometheus is designed to run as a single instance. To handle more data, your only option is to upgrade the server (Vertical Scaling). The cost of a server with 128GB of RAM is not cheap.

VictoriaMetrics: The Perfect Alternative

After struggling with Thanos and Cortex and finding them too complex, I switched to VictoriaMetrics (VM). The results far exceeded expectations:

  • Superior Data Compression: VM saves about 7-10 times the disk space. A 1TB data cluster on Prometheus usually shrinks to about 100GB when migrated to VM.
  • Smart RAM Management: Instead of holding everything in RAM, VM uses an efficient caching mechanism. RAM consumption is typically only 1/5 of Prometheus for the same workload.
  • Ultra-fast Deployment: The entire system is encapsulated in a single binary. No need to install dozens of auxiliary components.
  • Backward Compatibility: VM supports PromQL and MetricsQL. You can swap the URL in Grafana, and everything works immediately without modifying dashboards.

VictoriaMetrics Installation Guide (Single-node)

Below is how to deploy the Single-node version using Docker Compose. This approach is powerful enough to handle systems with millions of metrics per second.

Step 1: Set up Docker Compose

Create a docker-compose.yml file with the following optimized configuration:

version: '3.8'
services:
  victoriametrics:
    container_name: victoriametrics
    image: victoriametrics/victoria-metrics:v1.94.0
    ports:
      - "8428:8428"
    volumes:
      - vmdata:/storage
    command:
      - "--storageDataPath=/storage"
      - "--retentionPeriod=12" # 12-month retention
    restart: always

  vmagent:
    container_name: vmagent
    image: victoriametrics/vmagent:v1.94.0
    depends_on:
      - victoriametrics
    ports:
      - "8429:8429"
    volumes:
      - vmagentdata:/vmagentdata
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - "--promscrape.config=/etc/prometheus/prometheus.yml"
      - "--remoteWrite.url=http://victoriametrics:8428/api/v1/write"
    restart: always

volumes:
  vmdata:
  vmagentdata:

Step 2: Configure the Scraper

Even when using VM, we still use the standard Prometheus config file to declare targets. Create a prometheus.yml file:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'vmagent'
    static_configs:
      - targets: ['localhost:8429']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['192.168.1.10:9100']

Step 3: Activate the System

Run the following command to start the entire stack:

docker-compose up -d

At this point, vmagent will collect data and push it to victoriametrics via the remote write protocol. The scraping load is now completely decoupled.

Connecting Grafana and Optimization

The transition is extremely simple. In Grafana, just create a Prometheus Data Source, enter the URL as http://<your-ip>:8428 and click Save. Chart loading speeds will be noticeably faster, especially for long-range queries.

Real-world Advice: When Should You Switch?

Don’t rush to tear everything down if your system is running fine. Prometheus remains an excellent choice for small clusters with short-term retention (under 30 days).

However, consider VictoriaMetrics immediately if you encounter these cases:

  • Servers constantly alert for low RAM or are frequently OOM killed.
  • Cloud storage costs (EBS/S3) are rising too high due to metrics volume.
  • You need to store data for years for compliance or audit reports.

Pro-tip: You can run both in parallel. Use Prometheus for real-time Alerting and push data to VM as a long-term storage vault. This hybrid approach lets you leverage the strengths of both platforms.

In summary, VictoriaMetrics is a worthwhile upgrade for any DevOps Engineer struggling with monitoring challenges. Good luck with your system optimization!

Share: