Log Management as Systems Scale: A Problem Nobody Wants but Everyone Faces
When I first set up 2–3 servers, I’d SSH directly into each machine and run tail -f /var/log/syslog to debug. That worked fine — until the server count grew to 8, then 15. When production breaks at 2 AM and you’re SSH-ing into each machine hunting for error messages, that’s when you truly need centralized logging.
Graylog is what I brought into production after I’d been burned enough times. After 6 months of real-world use, I’m writing down what’s actually useful — not a copy of the official documentation.
What Is Graylog and Why Not Use ELK Stack?
Straight to the point: Graylog receives logs from all your servers, indexes everything into OpenSearch, and lets you search and create alerts from a single UI. No more opening 10 terminal tabs to grep across individual machines.
The architecture consists of 3 main components:
- Graylog Server: processes, parses, and routes logs
- MongoDB: stores metadata — configuration, users, streams, alerts
- OpenSearch (or Elasticsearch): the full-text search engine — where logs are actually stored and queried
What about ELK Stack? I’ve run both. ELK offers more customization power, but takes around 2–3 days to set up properly. Graylog? Half a day and you’re done. For a small team without dedicated DevOps, that’s a significant difference — especially for alert and stream management, where Graylog clearly wins.
Installing Graylog with Docker Compose
Docker Compose is the fastest way to get Graylog running. I use this approach in both dev and small production environments — it’s stable for workloads under 50GB of logs per day.
Create the docker-compose.yml file:
version: '3.8'
services:
mongodb:
image: mongo:6.0
volumes:
- mongodb_data:/data/db
restart: unless-stopped
opensearch:
image: opensearchproject/opensearch:2.11.0
environment:
- discovery.type=single-node
- DISABLE_SECURITY_PLUGIN=true
- OPENSEARCH_JAVA_OPTS=-Xms1g -Xmx1g
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- opensearch_data:/usr/share/opensearch/data
restart: unless-stopped
graylog:
image: graylog/graylog:5.2
environment:
- GRAYLOG_PASSWORD_SECRET=somepasswordpepper1234567890abc
- GRAYLOG_ROOT_PASSWORD_SHA2=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
- GRAYLOG_HTTP_EXTERNAL_URI=http://YOUR_SERVER_IP:9000/
- GRAYLOG_ELASTICSEARCH_HOSTS=http://opensearch:9200
- GRAYLOG_MONGODB_URI=mongodb://mongodb:27017/graylog
depends_on:
- mongodb
- opensearch
ports:
- "9000:9000" # Web UI
- "12201:12201/udp" # GELF UDP input
- "514:514/udp" # Syslog UDP input
volumes:
- graylog_data:/usr/share/graylog/data
restart: unless-stopped
volumes:
mongodb_data:
opensearch_data:
graylog_data:
Important: The GRAYLOG_ROOT_PASSWORD_SHA2 above is the hash of the string “admin”. Change it before deploying to production:
echo -n "your_strong_password" | sha256sum | cut -d' ' -f1
Start the stack:
docker compose up -d
# Wait approximately 60–90 seconds for Graylog to fully start
docker compose logs -f graylog
The Web UI runs at http://YOUR_SERVER_IP:9000. Log in with admin and the password you just created.
Creating Inputs to Receive Logs
Open System → Inputs in the Web UI. Create these 2 inputs first:
1. Syslog UDP (port 514)
Select Syslog UDP, set the bind address to 0.0.0.0, port 514. This input receives logs directly from rsyslog — fast, simple, no extra agent required.
2. GELF UDP (port 12201)
Select GELF UDP, port 12201. GELF (Graylog Extended Log Format) is a structured JSON format — it supports custom fields and is ideal for application logs from Docker containers or custom-built apps.
Configuring rsyslog on Client Servers
On each Linux server that needs to send logs to Graylog, configure rsyslog:
sudo nano /etc/rsyslog.d/90-graylog.conf
Add a single line to the file:
# Forward all logs via UDP to Graylog
*.* @GRAYLOG_SERVER_IP:514;RSYSLOG_SyslogProtocol23Format
Restart rsyslog and test immediately:
sudo systemctl restart rsyslog
# Send a test message
logger -t test "Hello from $(hostname)"
Open the Graylog Web UI → Search, look for source:your_hostname — if you see the “Hello from…” message, you’re done.
Sending Docker Container Logs via GELF
For Docker, add the log driver to the docker-compose.yml of the service you want to monitor:
services:
my_app:
image: my_app:latest
logging:
driver: gelf
options:
gelf-address: "udp://GRAYLOG_SERVER_IP:12201"
tag: "my_app"
Creating Streams and Alerts
Streams are my favorite Graylog feature. They route logs by rule into separate “channels” — a dedicated stream for Nginx, one for SSH auth, one for application errors. Incredibly useful when you need to debug a specific service without drowning in the overall log feed.
Create a stream at Streams → Create Stream. Some useful rules:
- Field
sourcecontainswebserver-01— filter logs from a specific server - Field
messagematches regexERROR|CRITICAL|FATAL— catch all error-level events - Field
facilityequalsauth— only capture auth logs (SSH, sudo)
Once you have a stream, add an Alert Condition. For example: “send a notification if there are more than 10 ERROR messages within 5 minutes”. Pair it with a Notification to deliver alerts to Slack, email, or a webhook — your choice.
I’ll be honest: alert fatigue was a real problem I ran into immediately. Set the threshold too low — alerts fire constantly, and you start ignoring everything. I had to tune it multiple times: raise thresholds, add AND conditions, whitelist certain sources. The lesson learned: start with high thresholds, observe for 1–2 weeks to understand your system’s baseline, then gradually lower them. Your production environment has its own traffic patterns — don’t copy numbers from some blog post.
Takeaways After 6 Months in Production
Graylog genuinely solves the “where are the logs?” problem. Here’s what I noted after six months:
- Incident debugging time dropped from ~30 minutes to ~5 minutes because I can search across all servers in one UI
- Detected 2 SSH brute-force attempts thanks to alerts on failed auth logs
- Retention policy (auto-deleting logs older than 30 days) keeps disk usage stable — no more sudden disk-full surprises
One thing to know upfront: Graylog is fairly RAM-hungry. My single-node cluster needs at least 8GB to run smoothly — OpenSearch alone consumes 2–4GB. For smaller servers, consider Grafana Loki: much lighter, slightly weaker search capabilities, but sufficient for most use cases.
But if you have the budget and are managing 20+ servers — Graylog remains a solid choice. Set it up once, use it for the long haul. The next time something breaks at 2 AM, you’ll suffer a lot less.
