I ran a Prometheus Agent + Promtail + OpenTelemetry Collector stack in parallel on production for nearly a year. Three binaries, three config files, three systemd services — every update was a headache. When I switched to Grafana Alloy, the entire telemetry collection pipeline fit into a single binary. Here’s what I learned after 6 months running it in production.
Install and Run Alloy in 5 Minutes
Install on Ubuntu/Debian
# Add the Grafana APT repository
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
# Install Alloy
sudo apt-get update
sudo apt-get install -y alloy
Minimal Config — Collect Metrics and Logs Immediately
Create the file /etc/alloy/config.alloy with the following content:
// Collect metrics from node_exporter (replaces Prometheus scrape)
prometheus.scrape "node_exporter" {
targets = [{"__address__" = "localhost:9100"}]
forward_to = [prometheus.remote_write.main.receiver]
}
// Send metrics to Prometheus/Mimir
prometheus.remote_write "main" {
endpoint {
url = "https://your-mimir-endpoint/api/prom/push"
basic_auth {
username = "your-username"
password = "your-api-key"
}
}
}
// Collect logs from systemd journal (replaces Promtail)
loki.source.journal "systemd" {
forward_to = [loki.write.main.receiver]
}
// Send logs to Loki
loki.write "main" {
endpoint {
url = "https://your-loki-endpoint/loki/api/v1/push"
basic_auth {
username = "your-username"
password = "your-api-key"
}
}
}
# Enable and start Alloy
sudo systemctl enable alloy
sudo systemctl start alloy
# Check status
sudo systemctl status alloy
# Follow logs
sudo journalctl -u alloy -f
That’s it. Alloy is now scraping node_exporter metrics and forwarding logs from systemd — two tasks that previously required two separate agents, now running in a single process.
What Alloy Is and Why Its Architecture Is Different
Grafana Alloy launched in 2024, inheriting Grafana Agent (Flow mode). Unlike Prometheus Agent or Promtail, Alloy is built on top of the OpenTelemetry Collector and extended with native components for both the Prometheus and Loki ecosystems — a single binary handling all three telemetry types.
Before Alloy, a typical monitoring stack on each server required:
- Prometheus Agent — scrape metrics, remote write
- Promtail — collect logs, ship to Loki
- OpenTelemetry Collector — collect traces from applications
Three separate processes, three different config syntaxes. When one of them crashed at 3 AM, you had to debug each one individually. Alloy unifies all of them.
Everything Is a Component
Alloy’s config syntax uses an HCL-like syntax where each element is a component with inputs and outputs. You wire them together into a pipeline:
// Component A produces data → forwards to Component B
component_type_a "my_label" {
forward_to = [component_type_b.my_label.receiver]
}
component_type_b "my_label" {
// automatically receives data from component_a
}
Think of it like Lego — snap blocks together in whatever pipeline order you want. This approach is far easier to debug than Promtail’s flat YAML config: one glance at the graph tells you exactly which component data is stuck at, no need to sift through logs line by line.
Advanced Configuration
Collecting Logs from Multiple Sources Simultaneously
// Read application log files
loki.source.file "app_logs" {
targets = [
{__path__ = "/var/log/nginx/access.log", job = "nginx", env = "production"},
{__path__ = "/var/log/myapp/*.log", job = "myapp", env = "production"},
]
forward_to = [loki.write.local.receiver]
}
// Read systemd journal and add labels from journal fields
loki.source.journal "system" {
forward_to = [loki.write.local.receiver]
relabel_rules = loki.relabel.journal_labels.rules
}
loki.relabel "journal_labels" {
rule {
source_labels = ["__journal__systemd_unit"]
target_label = "unit"
}
forward_to = []
}
loki.write "local" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
}
}
Receiving OpenTelemetry Traces from Applications
// Receive traces via gRPC (4317) and HTTP (4318)
otelcol.receiver.otlp "default" {
grpc { endpoint = "0.0.0.0:4317" }
http { endpoint = "0.0.0.0:4318" }
output {
traces = [otelcol.exporter.otlp.tempo.input]
}
}
// Forward traces to Tempo
otelcol.exporter.otlp "tempo" {
client {
endpoint = "tempo:4317"
tls { insecure = true }
}
}
Service Discovery for Docker
// Automatically scrape all running containers
discovery.docker "containers" {
host = "unix:///var/run/docker.sock"
}
prometheus.scrape "docker_targets" {
targets = discovery.docker.containers.targets
forward_to = [prometheus.remote_write.main.receiver]
scrape_interval = "30s"
}
Debugging with the Built-in UI
The feature I use most that often goes unnoticed: Alloy ships with a web UI on port 12345. Visit http://localhost:12345 to see a flowchart of the entire component pipeline, the health status of each component, and the data flowing through each stage.
# Access securely via SSH tunnel
ssh -L 12345:localhost:12345 user@your-server
# Then open http://localhost:12345 on your local machine
Real-World Tips After 6 Months in Production
1. Add Context Labels to Fight Alert Fatigue
When I first set up monitoring, I was drowning in alerts because the thresholds were wrong — a 70% CPU spike on staging triggered the same alarm as production. The fix: inject context labels directly in Alloy before shipping to Mimir, then write Alertmanager rules that filter by environment:
// Add environment labels to all metrics before remote write
prometheus.relabel "add_context" {
rule {
target_label = "environment"
replacement = "production"
}
rule {
target_label = "region"
replacement = "ap-northeast-1"
}
forward_to = [prometheus.remote_write.main.receiver]
}
Result: alerts only fire when CPU exceeds 80% and the environment is “production”. Staging and dev stay silent.
2. Validate and Reload Config with Zero Downtime
# Format and check syntax
alloy fmt /etc/alloy/config.alloy
# Reload config without restarting the service
sudo systemctl reload alloy
# Or via the HTTP API
curl -X POST http://localhost:12345/-/reload
3. Automatic Migration from Promtail
Grafana ships a built-in tool to convert Promtail configs to Alloy format:
alloy convert \
--source-format=promtail \
--output=/etc/alloy/config.alloy \
/etc/promtail/config.yml
This tool helped me migrate 12 servers in a single afternoon. The output isn’t 100% perfect — some label rules needed manual tweaking — but it saved roughly 70% of the time compared to writing everything from scratch.
4. Real-World Resource Usage
Measured on a 2 vCPU / 4GB RAM server running ~50 scrape targets plus log collection from three services:
- Alloy: ~80MB RAM, ~2% CPU average
- 3 separate agents previously: ~250MB RAM combined
A ~170MB difference. Sounds small, but on 1GB VPS instances or edge nodes like a Raspberry Pi, that’s a number you actually feel — a few of my smaller servers used to swap because the agents were eating RAM. Switching to Alloy solved that entirely.
5. Running with Docker
docker run -d \
--name alloy \
--network host \
-v /etc/alloy:/etc/alloy \
-v /var/log:/var/log:ro \
-v /run/log/journal:/run/log/journal:ro \
-v /etc/machine-id:/etc/machine-id:ro \
grafana/alloy:latest \
run /etc/alloy/config.alloy
Grafana Alloy isn’t a silver bullet. If your stack only needs to scrape metrics and never touches Loki or traces, plain Prometheus is still simpler — fewer moving parts, larger community. But when you need all three telemetry types, Alloy cuts a significant chunk of operational overhead without sacrificing flexibility.

