Building an Intelligent Alert System with LLM: Automatically Classifying and Summarizing Prometheus Alerts to Reduce Alert Fatigue – ITFROMZERO

Three in the morning, your phone buzzes. You open it to find 47 Telegram messages from Alertmanager. Skim through — mostly InstanceDown from the same node doing a rolling restart. Not urgent. But your brain still has to process it. The next morning, exhausted, you check back and find 2 genuinely critical alerts buried in all that noise.

This is alert fatigue. I lived with it for two years before deciding I needed to do something serious about it.

Table of Contents

Comparing 3 Approaches to Smarter Alert Handling

Approach 1: Rule-based Filtering in Alertmanager

Almost everyone starts here — adding routes, silences, and inhibition rules in alertmanager.yml.

# alertmanager.yml
routes:
  - match:
      severity: warning
    receiver: 'slack-warnings'
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 4h
  - match:
      severity: critical
    receiver: 'pagerduty-oncall'

inhibit_rules:
  - source_match:
      alertname: NodeDown
    target_match:
      job: node
    equal: ['instance']

Pros: Simple, no additional infrastructure needed, native Alertmanager support.

Cons: Rules must be written by hand and don’t scale as the system grows. Every new alert pattern means opening that YAML file again. More importantly — it filters but doesn’t explain. SysAdmins still have to read each alert to understand the context.

Approach 2: ML-based Anomaly Detection

Tools like Grafana Machine Learning, Metis, or custom-built models using Prophet/LSTM to detect anomalies instead of relying on hard thresholds.

Pros: Smarter, learns patterns over time, reduces false positives.

Cons: Complex setup, requires training data, hard to debug when the model produces wrong results. High operational cost. For a team of 2-3 people, this is overkill.

Approach 3: LLM-based Summarization (what I currently use)

Instead of filtering or learning patterns, use an LLM to read and understand alerts and answer the question that actually matters: “Does this require me to wake up right now? If so, why?”

Pros: No training data needed. And unlike rule-based approaches, LLM understands context — it knows that DiskSpaceRunningLow on a database node is far more dangerous than on a log collector node. Prompts are easy to customize with each team’s domain knowledge.

Cons: Has latency (1-3 seconds per API call). API cost — but for a typical mid-sized system’s alert volume, total cost is just a few dollars per month.

Why I Chose LLM

I tried all three. The rule-based approach ran for 6 months until the alertmanager.yml was nearly 300 lines long and nobody dared touch it. For ML, I tried Grafana ML — results weren’t bad, but after one major deploy that changed alert patterns, the model took nearly 2 weeks to adapt.

LLM wins at something the other two can’t do: it answers human questions, not machine questions. Instead of just telling you “CPU usage > 90%”, it can say: “Database server CPU has been at 95% for 20 minutes, coinciding with the daily backup job window — likely normal, but worth checking if the backup isn’t done in the next 2 hours.”

Rule-based can’t do this. Not because of missing data — but because it doesn’t understand semantics.

System Architecture

Simple flow:

Prometheus → Alertmanager → Webhook Receiver (Python) → LLM API → Telegram/Slack

Alertmanager sends alerts to a small Python server via webhook. This server calls the LLM to classify and summarize them, then forwards the “translated” results to the team’s Telegram.

Practical Deployment Guide

Step 1: Configure the Alertmanager Webhook Receiver

Add a webhook receiver to alertmanager.yml:

receivers:
  - name: 'llm-summarizer'
    webhook_configs:
      - url: 'http://localhost:8080/alert'
        send_resolved: true
        http_config:
          bearer_token: 'your-secret-token'

routes:
  - receiver: 'llm-summarizer'
    group_by: ['alertname', 'cluster', 'service']
    group_wait: 10s
    group_interval: 2m
    repeat_interval: 1h

Important tip: set group_interval: 2m so Alertmanager batches related alerts for 2 minutes before sending — the LLM receives a batch of related alerts instead of individual ones.

Step 2: Python Webhook Server with LLM

Install dependencies:

pip install fastapi uvicorn anthropic httpx python-dotenv

File alert_summarizer.py:

import os
import json
import httpx
import anthropic
from fastapi import FastAPI, Request, HTTPException, Header
from typing import Optional

app = FastAPI()
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

TELEGRAM_TOKEN = os.environ["TELEGRAM_BOT_TOKEN"]
TELEGRAM_CHAT_ID = os.environ["TELEGRAM_CHAT_ID"]
WEBHOOK_TOKEN = os.environ["WEBHOOK_SECRET_TOKEN"]

SYSTEM_PROMPT = """You are a senior SRE analyzing Prometheus alerts for a production system.
Classify severity: CRITICAL (requires immediate action), WARNING (needs monitoring), INFO (informational).
Provide a concise summary, explain the real-world impact, and suggest the first action to take.
Return JSON format: {"severity": "CRITICAL|WARNING|INFO", "summary": "...", "impact": "...", "action": "..."}"""

def format_alerts_for_llm(alerts: list) -> str:
    lines = []
    for a in alerts:
        labels = a.get("labels", {})
        annotations = a.get("annotations", {})
        status = a.get("status", "firing")
        lines.append(
            f"- [{status.upper()}] {labels.get('alertname', 'Unknown')}"
            f" | instance={labels.get('instance', 'N/A')}"
            f" | severity={labels.get('severity', 'N/A')}"
            f" | {annotations.get('description', annotations.get('summary', ''))}"
        )
    return "\n".join(lines)

async def send_telegram(text: str):
    url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
    async with httpx.AsyncClient() as client_http:
        await client_http.post(url, json={
            "chat_id": TELEGRAM_CHAT_ID,
            "text": text,
            "parse_mode": "Markdown"
        })

@app.post("/alert")
async def receive_alert(
    request: Request,
    authorization: Optional[str] = Header(None)
):
    if authorization != f"Bearer {WEBHOOK_TOKEN}":
        raise HTTPException(status_code=401)
    
    payload = await request.json()
    alerts = payload.get("alerts", [])
    if not alerts:
        return {"status": "no alerts"}

    alert_text = format_alerts_for_llm(alerts)
    
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # Use Haiku to reduce costs
        max_tokens=512,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": f"Analyze the following alerts:\n{alert_text}"}]
    )
    
    result = json.loads(response.content[0].text)
    severity = result.get("severity", "INFO")
    emoji = {"CRITICAL": "🔴", "WARNING": "🟡", "INFO": "🔵"}.get(severity, "⚪")
    
    message = (
        f"{emoji} *{severity}* — {len(alerts)} alert(s)\n\n"
        f"*Summary:* {result.get('summary')}\n"
        f"*Impact:* {result.get('impact')}\n"
        f"*Action:* {result.get('action')}"
    )
    
    await send_telegram(message)
    return {"status": "ok", "severity": severity}

Run the server:

export ANTHROPIC_API_KEY="sk-ant-..."
export TELEGRAM_BOT_TOKEN="..."
export TELEGRAM_CHAT_ID="-100..."
export WEBHOOK_SECRET_TOKEN="your-secret"

uvicorn alert_summarizer:app --host 0.0.0.0 --port 8080

Step 3: Run as a systemd Service

[Unit]
Description=LLM Alert Summarizer
After=network.target

[Service]
Type=simple
User=prometheus
EnvironmentFile=/etc/alert-summarizer/env
ExecStart=/usr/local/bin/uvicorn alert_summarizer:app --host 0.0.0.0 --port 8080
Restart=always
RestartSec=5
WorkingDirectory=/opt/alert-summarizer

[Install]
WantedBy=multi-user.target

Optimization Tips from Real-World Use

Use Haiku instead of Sonnet for alert summarization — faster response time (~0.8s vs ~2s), 5x lower cost. With 500 alert events/day, total cost is under $3/month.
Cache processed alerts: Use Redis or an in-memory dict to store the hash of each alert batch for 30 minutes, avoiding duplicate LLM calls when Alertmanager resends.
Inject system context into the system prompt: List your most critical services, regular maintenance windows, and normal alert patterns for your team. The LLM will classify much more accurately.
Set a separate CRITICAL threshold: Only page on-call when severity is CRITICAL. Bundle WARNINGs and INFOs into a morning digest — don’t wake someone up at 3 AM for something that can wait until 9.

Results After 3 Months of Real-World Use

From 40-60 Telegram messages every night down to 3-5 meaningful ones. Over 90% noise reduction. The team started reading alerts again — instead of reflexively muting the bot.

API cost with Claude Haiku: around $2-3/month for a mid-sized system (~500 alert events/day). Much cheaper than unnecessary sleepless nights.

The biggest change wasn’t the numbers. When there’s a genuinely serious alert, it stands out clearly — no longer buried in noise. And I can sleep.