How to Install and Configure OpenObserve: A Next-Generation Logs, Metrics, and Traces Solution to Replace ELK Stack with 140x Lower Storage Costs

Monitoring tutorial - IT technology blog
Monitoring tutorial - IT technology blog

The Real Problem: ELK Stack Is Draining Your Team’s Budget

Last year, our team ran into a serious pain point with our monitoring stack. We were running ELK Stack (Elasticsearch + Logstash + Kibana) to collect logs from around 20 servers — and every month, storage costs kept climbing with no end in sight. Elasticsearch stores data using Lucene indexes, which means disk usage typically runs 3–5x higher than the actual raw log size.

And don’t even get me started on RAM. Elasticsearch needs at least 4GB of heap to run reliably in production. A 3-node cluster means 12GB of RAM just for Elasticsearch alone — not counting Logstash and Kibana. VPS bills grew every month while the team’s budget stayed flat.

Why ELK Is So Expensive for Observability

Elasticsearch was originally designed as a search engine. Repurposing it for observability drags in a lot of unnecessary overhead:

  • Lucene’s inverted index is optimized for full-text search — for time-series logs, it’s pure dead weight
  • Schema-on-write means you have to define mappings upfront — dynamic mapping can easily cause field explosion if log formats are inconsistent
  • The default replication factor requires at least 2 nodes for HA, meaning your data is duplicated on disk
  • JVM overhead — heap warm-up time and GC pauses become a constant headache under heavy queries

ELK Stack is powerful, but it was built to be a search engine, not a log aggregator. Using it for observability is like hiring a 10-ton truck to deliver a single box — it gets the job done, but the operating costs are completely out of proportion.

Alternatives We Evaluated

Before settling on a solution, we benchmarked several alternatives:

  • Grafana Loki: Much lighter since it only indexes metadata instead of full content. But LogQL has a learning curve, and search performance starts to degrade when data reaches tens of gigabytes without full indexing.
  • Graylog: Still uses Elasticsearch under the hood — it just adds a nicer UI on top. The storage problem remains completely unsolved.
  • VictoriaMetrics: Excellent for metrics, but at the time, native log support wasn’t mature enough for production use.
  • OpenObserve: Handles logs, metrics, and traces in a single binary — with storage costs up to 140x lower than ELK according to their official benchmarks.

OpenObserve — What We’re Actually Running in Production

OpenObserve (formerly ZincObserve) is written in Rust and stores data in Parquet format with high compression. The key differences from ELK come down to a few things:

  • A single binary under 10MB — no JVM, no Elasticsearch required
  • S3-compatible storage support — store logs on Cloudflare R2 or MinIO at a fraction of the cost of traditional block storage
  • Built-in UI for querying logs with SQL, metrics with PromQL, and traces via the OpenTelemetry standard
  • Real-world RAM footprint of only around 100–150MB when idle — measured directly on our production server

Installing OpenObserve with Docker

The fastest way to test on a new server:

# Create the data directory
mkdir -p /opt/openobserve/data

# Run OpenObserve
docker run -d \
  --name openobserve \
  --restart unless-stopped \
  -p 5080:5080 \
  -e [email protected] \
  -e ZO_ROOT_USER_PASSWORD=StrongPass123! \
  -e ZO_DATA_DIR=/data \
  -v /opt/openobserve/data:/data \
  public.ecr.aws/zinclabs/openobserve:latest

Once it’s running, open your browser at http://your-server-ip:5080 and log in with the credentials you just set.

Docker Compose for Production Environments

version: '3.8'
services:
  openobserve:
    image: public.ecr.aws/zinclabs/openobserve:latest
    container_name: openobserve
    restart: unless-stopped
    ports:
      - "5080:5080"
    environment:
      ZO_ROOT_USER_EMAIL: "[email protected]"
      ZO_ROOT_USER_PASSWORD: "ChangeThisToSomethingStrong!"
      ZO_DATA_DIR: /data
      ZO_TELEMETRY: "false"
    volumes:
      - openobserve_data:/data

volumes:
  openobserve_data:

Start it with:

docker compose up -d
# Check startup logs
docker compose logs -f openobserve

Shipping Logs to OpenObserve with Fluent Bit

Fluent Bit is the lightest log collection agent I’ve ever used — only around 5MB of RAM, compared to Logstash which demands 500MB or more. Install it on Ubuntu/Debian:

curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
systemctl enable fluent-bit --now

Create a config file to forward logs to OpenObserve:

# /etc/fluent-bit/fluent-bit.conf
[SERVICE]
    Flush         5
    Log_Level     info

[INPUT]
    Name          tail
    Path          /var/log/syslog
    Tag           server.syslog
    Read_from_Head False

[INPUT]
    Name          tail
    Path          /var/log/nginx/access.log
    Tag           nginx.access
    Read_from_Head False

[OUTPUT]
    Name          http
    Match         *
    Host          your-openobserve-server
    Port          5080
    URI           /api/default/server_logs/_json
    Format        json
    Http_User     [email protected]
    Http_Passwd   ChangeThisToSomethingStrong!
    compress      gzip
    tls           Off
systemctl restart fluent-bit
# Verify logs are being shipped
journalctl -u fluent-bit -f

Querying Logs with SQL in the UI

No need to learn LogQL or KQL — OpenObserve uses SQL syntax, so anyone familiar with relational databases can hit the ground running. Go to the Logs menu, select your stream, and run:

-- Find all ERRORs in the past hour
SELECT * FROM "server_logs"
WHERE log LIKE '%ERROR%'
ORDER BY _timestamp DESC
LIMIT 100

-- Count errors per hour to spot trends
SELECT
  date_trunc('hour', _timestamp) AS hour,
  count(*) AS error_count
FROM "server_logs"
WHERE log LIKE '%ERROR%'
GROUP BY 1
ORDER BY 1 DESC

Configuring Alerts — Avoiding the Alert Fatigue Trap

This is where I spent the most time when first setting things up. Early on I set thresholds way too low — we were getting pings every 2–3 minutes until the whole DevOps team muted the chat group. Hard lesson learned: thresholds must be based on actual baselines from at least 1–2 weeks of historical logs, not gut feelings.

In OpenObserve, go to AlertsCreate Alert. A sample configuration:

{
  "name": "High Error Rate",
  "stream_name": "server_logs",
  "query": "SELECT count(*) as error_count FROM server_logs WHERE log LIKE '%ERROR%'",
  "condition": {
    "column": "error_count",
    "operator": ">",
    "value": 50
  },
  "duration": 5,
  "frequency": 1,
  "time_between_alerts": 30
}

The two most important parameters: set time_between_alerts to at least 30 minutes to prevent notification spam. And duration: 5 means the condition must hold true continuously for 5 minutes before triggering — this effectively filters out transient spikes that would otherwise cause false positives.

Alerts can be delivered via Slack, Webhook, or email. Configure destinations under AlertsDestinations.

Real-World Results After 3 Months on OpenObserve

Same log volume from the same 20 servers — here’s what we measured:

  • Disk usage: Down from 180GB/month (ELK) to 13GB/month (OpenObserve) — nearly 14x reduction
  • RAM consumption: Down from 12GB (3-node ELK cluster) to 256MB (OpenObserve single node)
  • VPS cost: Down from $80/month to $12/month for the same workload
  • Setup time: 10 minutes with Docker, versus half a day to set up ELK properly

OpenObserve isn’t a drop-in replacement for Elasticsearch in every scenario. If you need complex full-text search for application data, Elasticsearch is still the right tool. But for observability — collecting, storing, and analyzing infrastructure logs, metrics, and traces — this is something I wish I’d found two years earlier.

Share: