Guide to configuring SMS and Telegram alerts with Alertmanager

Monitoring tutorial - IT technology blog
Monitoring tutorial - IT technology blog

Introduction to the Problem

Your IT systems might run smoothly every day, but what happens when an incident strikes? A server abruptly stops, a critical service fails, or system resources hit dangerous levels. Without a reliable alert system, you might only discover issues when users start complaining. By then, the consequences could already be very serious.

Monitoring is the first and essential step to maintaining a stable system. However, monitoring alone is not enough. You need an effective alert system that provides instant notifications, anytime, anywhere. This article guides you on how to configure Alertmanager – a core component of Prometheus – to send alerts via SMS and Telegram. These are two fast and reliable notification channels, highly suitable for administrators.

When I first started setting up monitoring, I also experienced ‘alert fatigue’ – being overwhelmed by countless unimportant alerts. This almost made me miss truly critical incidents. I spent a lot of time adjusting thresholds and configuring Alertmanager. The goal was to create a balanced alert system that only notifies when truly necessary. Therefore, in addition to the basic configuration guide, I will also share experiences to help you avoid ‘alert fatigue’ from the start.

Core Concepts

To set up effective alerts, first, it’s essential to understand the main components involved in this process.

Prometheus: The Heart of the Monitoring System

Although this article focuses on Alertmanager, we cannot overlook Prometheus. It is an open-source monitoring and alerting system, specializing in collecting metrics from configured targets. When a metric exceeds a threshold, Prometheus generates an alert. However, it does not send them directly but forwards them to Alertmanager.

Alertmanager: The Brain of Alert Processing

Alertmanager is an independent component. It receives alerts from Prometheus (or other sources), then processes them based on pre-defined rules. The main functions of Alertmanager include:

  • Grouping: Groups multiple similar alerts into a single notification. For example, if 10 servers simultaneously report disk space errors, Alertmanager will send one common notification instead of 10 individual messages.
  • Inhibition: Suppresses dependent alerts when a primary alert is triggered. For instance, if the main server reports a loss of connection, Alertmanager will automatically prevent alerts about services running on that server.
  • Silencing: Allows temporarily disabling alerts for a specific period (e.g., during system maintenance).
  • Routing: Sends alerts to different receivers, based on alert labels. Receivers can include email, Slack, PagerDuty, webhooks. This article will focus on Telegram and SMS.

Why SMS and Telegram?

In a modern system environment, receiving timely alerts is a crucial factor.

  • Telegram: As a popular messaging app, Telegram provides a flexible bot API, making integration for sending notifications easy. Key advantages: completely free, can send more information than email, and notifications are almost instant. This is a very convenient tool for technical teams.
  • SMS: Although seemingly ‘classic’, SMS remains an extremely reliable notification channel. Especially in emergency situations, when there is no internet connection or messaging applications encounter issues, SMS still works. It is available on every mobile phone and does not require a special application. SMS is a critical fallback channel for the most severe alerts.

Detailed Practice: Configuring Alerts with Alertmanager

To begin, I will assume you already have Prometheus and Alertmanager running. If not, you can refer to the Prometheus and Grafana installation guides on itfromzero.com, or use Docker for a quick setup.

1. Telegram Notification Configuration

Telegram is a popular alert channel due to its flexibility and being completely free.

Step 1: Create a Telegram Bot and get the Bot Token

  1. Open the Telegram app, search for @BotFather.
  2. Start a conversation with @BotFather and type /newbot.
  3. Follow the instructions to name your bot (e.g., itfromzero_alert_bot) and choose a unique username (e.g., itfromzero_alert_bot).
  4. Once completed, @BotFather will provide an HTTP API Token. Store this token carefully, for example: 123456789:ABCDEFGH-IJKLMN_OPQRSTUVXYZ.

Step 2: Get the Chat ID of a group or user

Alertmanager needs to know where to send messages. This can be an individual or a group.

  1. Search for your bot on Telegram and start a conversation with it, or add the bot to a group chat where you want to receive alerts.
  2. Send any message to the bot or in the group chat containing the bot.
  3. Open your browser and access the following URL (replace <BOT_TOKEN> with your token):
    https://api.telegram.org/bot<BOT_TOKEN>/getUpdates
  4. You will receive a JSON response. Look for the chat and id fields. This is the chat_id you need. If it’s a group, the ID will be a negative number (e.g., -123456789).

Step 3: Edit Alertmanager Configuration (alertmanager.yaml)

Add or edit the receivers and routes sections in your alertmanager.yaml file.

# alertmanager.yaml
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'default-receiver' # Default receiver configuration

  routes:
  - match:
      severity: 'critical' # Alerts with 'critical' severity
    receiver: 'telegram-critical'
  - match:
      severity: 'warning' # Alerts with 'warning' severity
    receiver: 'telegram-warning'
  # You can add other rules here

receivers:
  - name: 'default-receiver'
    # You can configure a default receiver or leave it empty if you don't want default alerts.
    # telegram_configs:
    #   - chat_id: '<CHAT_ID_DEFAULT>'
    #     parse_mode: 'HTML'

  - name: 'telegram-critical'
    telegram_configs:
      - bot_token: '<BOT_TOKEN_CUA_BAN>'
        chat_id: '<CHAT_ID_NHOM_CRITICAL>' # Chat ID for group/user receiving critical alerts
        parse_mode: 'HTML'
        send_resolved: true # Send notification when alert is resolved

  - name: 'telegram-warning'
    telegram_configs:
      - bot_token: '<BOT_TOKEN_CUA_BAN>'
        chat_id: '<CHAT_ID_NHOM_WARNING>' # Chat ID for group/user receiving warning alerts
        parse_mode: 'HTML'
        send_resolved: true

Note: I use two different chat_ids for critical and warning to illustrate routing capabilities. You can certainly use the same chat_id for all alerts if desired. Replace <BOT_TOKEN_CUA_BAN>, <CHAT_ID_NHOM_CRITICAL>, <CHAT_ID_NHOM_WARNING> with your actual information.

Step 4: Reload Alertmanager Configuration

After changing the alertmanager.yaml file, you need to reload the configuration for Alertmanager to apply the changes.

# If you are running Alertmanager as a service
sudo systemctl reload alertmanager

# Or via API if enabled (recommended)
curl -XPOST http://localhost:9093/-/reload

Step 5: Test Telegram Alerts

To test, you need to create a dummy alert in Prometheus or trigger a real alert situation.

Example Prometheus configuration to create test alerts:

Add to prometheus.yml (or your rule file):

# prometheus.yml
# ...
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093'] # Your Alertmanager address

rule_files:
  - "alert_rules.yml" # Ensure this file is read by Prometheus
# ...

alert_rules.yml file:

# alert_rules.yml
groups:
- name: general.rules
  rules:
  - alert: HighLoadTest
    expr: node_load1 > 0.01 # Change this threshold for easy triggering
    for: 1s
    labels:
      severity: 'critical' # To match the telegram-critical route
    annotations:
      summary: "Server {{ $labels.instance }} is experiencing high load (test)"
      description: "Average load over 1 minute is {{ $value }} on {{ $labels.instance }}."
  - alert: LowDiskSpaceTest
    expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_avail_bytes{mountpoint="/"} * 100 < 90 # Below 90% free space
    for: 1s
    labels:
      severity: 'warning' # To match the telegram-warning route
    annotations:
      summary: "Disk {{ $labels.mountpoint }} on {{ $labels.instance }} is almost full (test)"
      description: "Only {{ $value | humanizePercentage }} free space remaining on {{ $labels.instance }}."

Reload Prometheus configuration:


curl -XPOST http://localhost:9090/-/reload

If everything is configured correctly, you will receive Telegram messages as soon as these alert conditions are triggered.

2. SMS Notification Configuration (Via Webhook and custom script)

Alertmanager does not have direct SMS integration. Therefore, we will use a webhook combined with a custom script to call an SMS Gateway service. This method offers high flexibility and customization.

Step 1: Understand Webhook and SMS Gateway Mechanisms

  1. Webhook: Alertmanager will send an HTTP POST request to a URL you specify when an alert occurs. This request contains detailed alert information in JSON format.
  2. Custom Script/Service: You will run a small application (e.g., written in Python with Flask). This application will listen for POST requests from Alertmanager. Upon receiving a request, the script will parse the JSON, extract the necessary information, and call the API of an SMS provider (like Twilio, Nexmo, or a local SMS service).

Step 2: Prepare an SMS Gateway Service (Conceptual)

For simplicity, I will present a basic Python script. In reality, you will need to integrate it with an SMS provider’s API. This script only illustrates how to receive webhooks and process information.

Suppose you have an sms_gateway.py file like this:

# sms_gateway.py (This is an illustrative script; you need to develop it according to your SMS Provider)
from flask import Flask, request, jsonify
import os

app = Flask(__name__)

# Replace with your actual phone number for receiving SMS
TARGET_PHONE_NUMBER = os.environ.get("TARGET_PHONE_NUMBER", "+849xxxxxxxx") 

@app.route('/sms-alert', methods=['POST'])
def send_sms_alert():
    try:
        alert_data = request.get_json()
        
        # Process alert information from Alertmanager
        # You need to customize how you want the SMS content to be displayed
        for alert in alert_data.get('alerts', []):
            alertname = alert['labels'].get('alertname', 'Unknown Alert')
            severity = alert['labels'].get('severity', 'info')
            instance = alert['labels'].get('instance', 'Unknown Instance')
            summary = alert['annotations'].get('summary'!, 'No Summary')
            
            # Create SMS message content
            sms_message = f"[ITFZS] {severity.upper()} - {alertname} on {instance}: {summary}"
            
            print(f"Sending SMS to {TARGET_PHONE_NUMBER}: {sms_message}")
            
            # --- Here, you will call your SMS provider's API ---
            # Example with a hypothetical API:
            # import requests
            # sms_api_url = "https://api.sms_provider.com/send"
            # payload = {
            #     "to": TARGET_PHONE_NUMBER,
            #     "message": sms_message,
            #     "api_key": os.environ.get("SMS_API_KEY")
            # }
            # response = requests.post(sms_api_url, json=payload)
            # if response.status_code == 200:
            #     print("SMS sent successfully!")
            # else:
            #     print(f"Failed to send SMS: {response.status_code} - {response.text}")
            # ---
            
            # In this example, we just print to the console
            print("SMS alert processed (conceptual).")

        return jsonify({"status": "success", "message": "SMS alert processed conceptually."}), 200
    except Exception as e:
        print(f"Error processing SMS alert: {e}")
        return jsonify({"status": "error", "message": str(e)}), 500

if __name__ == '__main__':
    # Run this script on a port (e.g., 9099)
    # Ensure it is accessible from Alertmanager
    print("Starting SMS Gateway Mockup on port 9099...")
    app.run(host='0.0.0.0', port=9099)

To run this script, you need to install Flask (pip install Flask) and run it:


export TARGET_PHONE_NUMBER="+849xxxxxxxx" # Replace with your actual phone number
python sms_gateway.py

Ensure this script runs continuously and is accessible from Alertmanager (on the same server, or over the network if on a different server).

Step 3: Edit Alertmanager Configuration (alertmanager.yaml) for SMS

Add a new receiver using webhook_configs to point to your custom SMS Gateway script.

# alertmanager.yaml (Add to the same Alertmanager configuration file above)
# ...
route:
  # ... (Existing route section)
  routes:
  - match:
      severity: 'critical'
    receiver: 'sms-critical' # Route critical alerts to SMS
  - match:
      severity: 'emergency' # Example: add an extremely urgent level
    receiver: 'sms-critical' # Emergency alerts also sent via SMS
  # ... (Other routes)

receivers:
  # ... (Existing Telegram receivers)
  - name: 'sms-critical'
    webhook_configs:
      - url: 'http://localhost:9099/sms-alert' # URL of the custom SMS Gateway script
        send_resolved: true
        # You can configure http_config if your script requires authentication
        # http_config:
        #   basic_auth:
        #     username: 'smsuser'
        #     password: 'smspassword'

Replace http://localhost:9099/sms-alert with the actual address where your sms_gateway.py script is running.

Step 4: Reload Alertmanager Configuration and Test

Reload the Alertmanager configuration as you did for Telegram.


curl -XPOST http://localhost:9093/-/reload

You can update alert_rules.yml in Prometheus. Create an alert with severity: 'critical' (or emergency) to test. Check if the sms_gateway.py script receives the webhook and prints the notification. If you see the line Sending SMS to ... in the sms_gateway.py console, it means Alertmanager successfully sent the webhook. The next step is to integrate this script with a real SMS provider’s API.

3. Optimizing Alerts and Avoiding “Alert Fatigue”

As shared, ‘alert fatigue’ is a big problem. Alertmanager provides many useful features to manage the alert flow.

  • Grouping:
    Use group_by, group_wait, group_interval, repeat_interval in the route. This helps you avoid being ‘spammed’ when many similar alerts appear simultaneously.

    route:
      group_by: ['alertname', 'instance', 'severity'] # Group by alert name, instance, and severity level
      group_wait: 30s # Wait 30 seconds to collect more alerts before sending
      group_interval: 5m # If new alerts appear in the group, wait another 5 minutes before re-sending
      repeat_interval: 4h # Repeat alerts every 4 hours if not yet resolved
      receiver: 'default-receiver'
    

    With this configuration, Alertmanager will group alerts related to the same issue on the same instance. As a result, the number of notifications you receive will be significantly reduced.

  • Inhibition:
    Suppress ‘secondary’ alerts when a ‘primary’ alert has triggered. For example, if an entire server has an issue, you only want to be notified about that server. You wouldn’t want to receive dozens of alerts about services running on it.

    # alertmanager.yaml
    inhibit_rules:
    - source_match:
        severity: 'critical' # Server down alert has 'critical' severity
      target_match:
        severity: 'warning' # Service alert has 'warning' severity
      equal: ['instance'] # Apply if they occur on the same instance
    # This rule states: if there is a 'critical' alert on a specific 'instance',
    # then suppress all other 'warning' alerts on that same 'instance'.
    
  • Silences:
    When you know about a maintenance event or a temporary issue in advance, create a silence via Alertmanager’s web interface (usually http://localhost:9093). This feature temporarily stops alerts for a specified period, which is very useful when you are actively troubleshooting or upgrading the system.

By intelligently utilizing these features, you can build an effective alert system. This system will focus only on core information, helping you avoid ‘alert fatigue’ and react faster to any incidents.

Conclusion

Setting up an effective monitoring and alerting system is vital for any stable IT system. In this article, you learned how to configure Alertmanager to send alerts to Telegram and SMS. These are two powerful and reliable notification channels.

Telegram integration helps your team receive fast, free, and comprehensive notifications. Meanwhile, SMS acts as the final alert layer. It ensures that even in the most extreme situations, you remain informed about incidents.

Start configuring today to improve your response capabilities and maintain the stability of your system. A properly configured alert system not only gives you peace of mind but also demonstrates a professional and reliable IT infrastructure.

Share: