Checkmk 101: Unified IT Monitoring on a Single Dashboard

Monitoring tutorial - IT technology blog
Monitoring tutorial - IT technology blog

The Pain of Operating Fragmented Infrastructure

Managing 30-40 Linux and Windows servers scattered alongside various Cisco and Mikrotik switches and firewalls is a nightmare for any SysAdmin. I’ve been there—having 15 browser tabs open at once: Zabbix for servers, PRTG for switch traffic, and several custom Python scripts for service checks. When an incident hits, finding the root cause is like looking for a needle in a haystack.

Worse yet is the lack of alert synchronization. When a core switch fails, the system bombs you with hundreds of “Server Down” emails instead of pinpointing the network device as the culprit. This is when I realized the value of a Unified Monitoring solution like Checkmk.

Checkmk – Why Choose It Over Zabbix or Prometheus?

Checkmk doesn’t try to be overly complex. While Prometheus forces you to wrestle with PromQL and Zabbix complicates things with massive template sets, Checkmk takes a more pragmatic approach: Rule-based configuration.

The Auto-discovery feature is what I love most. You just install a tiny agent (under 100KB), and Checkmk automatically scans and suggests: “Hey, I see 5 disks, 2 network cards, and an Apache service running. Want to monitor them?”. Everything is set up in just a few clicks.

Installing Checkmk on Ubuntu 22.04/24.04

We will use the Checkmk Raw Edition (CRE). It’s completely free, open-source, and more than capable of handling SMB infrastructure.

Step 1: Preparing the Server Environment

You’ll need a VPS or physical server. To ensure smooth performance, allocate at least 2 vCPUs and 4GB of RAM. Don’t skimp on RAM because Checkmk stores monitoring data directly in memory for faster access.

sudo apt update && sudo apt upgrade -y
sudo apt install wget apt-transport-https gnupg -y

Step 2: Installing the Package

Go to the Checkmk homepage to get the latest download link. Here, I’m using version 2.3.0 for Ubuntu 22.04 (Jammy).

wget https://download.checkmk.com/checkmk/2.3.0p1/check-mk-raw-2.3.0p1_0.jammy_amd64.deb
sudo apt install ./check-mk-raw-2.3.0p1_0.jammy_amd64.deb

Once installed, the omd (Open Monitoring Distribution) command will be available. This is the primary tool for managing your Checkmk instances.

Step 3: Creating a Site (Instance)

Checkmk allows you to run multiple independent sites on the same physical server. This is extremely useful for separating Production and Testing environments.

sudo omd create monitoring
sudo omd start monitoring

The system will generate an admin user with a random password. Save this information and log in at: http://<SERVER_IP>/monitoring.

Monitoring Your First Linux Server

Forget about complex SNMP setup. On Linux, Checkmk uses a minimalist agent that transmits data via TCP port 6556.

Installing the Agent on the Host

Access the Dashboard, navigate to Setup > Agents > Linux, and download the .deb file to your target server.

# Commands to run on the server being monitored
sudo apt install ./check-mk-agent_2.3.0-1_all.deb

Adding the Host to the Dashboard

The steps are very intuitive:

  1. Go to Setup > Hosts > Add host.
  2. Enter the Hostname and IP.
  3. Click Save & run service discovery.

At this point, Checkmk will list everything from CPU and RAM to running services. Simply click Accept all. Don’t forget to click the yellow Changes button at the top corner to activate the new configuration.

Monitoring Network Devices: When SNMP Speaks

For switches or routers, SNMP is the only way. I usually use SNMP v2c for speed or v3 if high security is required. After entering the IP and Community String, Checkmk will automatically generate traffic graphs for every port (from Port 1 to 24/48).

It’s even smart enough to report device temperatures or fan speeds if the hardware supports the corresponding OIDs.

Hard-Won Experience: Don’t Let Alert Fatigue Burn You Out

My biggest mistake when I first started with Checkmk was enabling Telegram notifications for everything. The result? My phone vibrating incessantly on a Sunday night just because a backup server hit 100% CPU for exactly 5 minutes.

To survive, remember these 3 rules:

  • Critical Alerts Only: Push phone notifications only for truly severe issues. Keep Warning levels quiet on the Dashboard.
  • Configure Delays: Don’t alert immediately upon detecting an error. Set Checkmk to wait for 3-5 checks (each 1 minute apart); if the error persists, then send a message.
  • Set Realistic Thresholds: By default, 80% disk usage is a Warning. But for a 10TB drive, the remaining 20% is still a lot of space. Create custom rules for high-capacity drives.

Final Thoughts

Checkmk is the perfect intersection of power and simplicity. While the interface may feel a bit “classic” with its many sub-menus, once you understand the logic, system administration becomes much easier. If you want to stop manually checking logs every morning, give Checkmk a try today.

Share: