Why I Switched from ELK to Grafana Loki
Managing dozens of containers and having to ssh into each machine to run tail -f is a nightmare. I once maintained a Prometheus cluster for 15 servers. When a logic error occurred, metrics showed a CPU spike but gave no clue which specific line of code was “dying.” That’s when I realized: Metrics tell you the system is “sick,” but Logs show you where the “disease” is.
ELK Stack (Elasticsearch, Logstash, Kibana) is usually the first name that comes to mind. However, ELK is extremely resource-heavy. Running a stable Elasticsearch cluster typically costs at least 8-16GB of RAM—a luxury for small startups or staging environments. Loki emerged as a lifesaver with its “Prometheus for logs” philosophy. Instead of indexing the full content, Loki only indexes labels. As a result, storage requirements can be up to 10 times lower than Elasticsearch.
Quick Deployment with Docker Compose
To run the Loki-Promtail-Grafana trio, Docker Compose is the cleanest choice. The architecture is simple:
- Loki: Acts as the database for storage and query processing.
- Promtail: The agent that collects logs from physical files and pushes them to Loki.
- Grafana: The visualization interface where you turn dry log lines into insights.
Create the project directory and prepare the configuration files:
mkdir loki-stack && cd loki-stack
touch docker-compose.yaml promtail-config.yaml
Content of the docker-compose.yaml file (using the latest Loki 3.0 version):
version: "3.8"
services:
loki:
image: grafana/loki:3.0.0
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:3.0.0
volumes:
- /var/log:/var/log
- ./promtail-config.yaml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
Configuring Promtail: Don’t Lose Logs on Restart
The promtail-config.yaml configuration file below helps the agent know where to “scrape” logs. A critical note on positions: it allows Promtail to remember the last log line read. If a container restarts, it won’t push duplicate logs to Loki.
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
host: production-server-01
__path__: /var/log/*.log
Warning about High Cardinality
A classic mistake I made was putting user_id or client_ip into labels. In Loki, this creates millions of tiny indices, causing the system to freeze during queries. Pro tip: Only use labels for static information like env or app_name. Use LogQL to search for dynamic data later.
Unlocking the Power of LogQL on Grafana
After running docker-compose up -d, access localhost:3000. Add Loki as a Data Source with the URL http://loki:3100. In the Explore menu, you can start inspecting logs.
LogQL syntax is very powerful. To filter log lines containing “Timeout” in a system job, use:
{job="varlogs"} |= "Timeout"
Even more interesting, you can turn logs into charts instantly. For example, counting the number of 404 errors per minute from Nginx logs:
count_over_time({job="nginx"} |= "404" [1m])
By placing the error log chart next to the latency chart, I once discovered a buffer overflow bug in just 2 minutes, instead of spending an hour digging through text files.
Real-world Operational Experience
To keep Loki running smoothly in production, you should apply these three rules:
- Limit retention time: By default, Loki keeps logs forever. Configure
retention_period: 15dto avoid sudden disk exhaustion. - Leverage Cloud Storage: Instead of storing logs on local disks (which are expensive and volatile), configure Loki to push logs to S3 or Google Cloud Storage. The cost will be about 5-7 times cheaper.
- Pipeline Stages: Use Regex in Promtail to parse raw data. This allows you to filter logs by Status Code or Response Time extremely fast directly within the Grafana UI.
Setting up Loki might feel a bit unfamiliar to those used to ELK. However, once you master LogQL, your Observability will skyrocket. The system becomes more transparent, bugs are found faster, and most importantly, you’ll sleep better at night.

