Telegraf Installation Guide: The ‘Backbone’ of Metrics Collection for InfluxDB – ITFROMZERO

Table of Contents

Getting Started with Telegraf in 5 Minutes (Quick Start)

If you need a lightweight tool that works right out of the box to aggregate CPU, RAM, and Disk metrics from your server, Telegraf is the top choice. Instead of struggling with complex curl or python scripts, just a few tweaks to the .conf file are all you need.

Let’s jump into a quick installation on Ubuntu/Debian to push data to InfluxDB. I’ll assume you already have InfluxDB set up or are using the Cloud version:

# Add official InfluxData repository
wget -q https://repos.influxdata.com/influxdata-archive_key.gpg
echo "393e8779c8945d3195561a4411ac3c21c177026f151cb854766e13b29527e5e0 influxdata-archive_key.gpg" | sha256sum -c && cat influxdata-archive_key.gpg | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive.gpg > /dev/null
echo "deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive.gpg] https://repos.influxdata.com/debian $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/influxdata.list

# Install Telegraf
sudo apt-get update && sudo apt-get install telegraf

# Enable the service
sudo systemctl enable --now telegraf

Once installed, the configuration file is located at /etc/telegraf/telegraf.conf. Open the file, find the [[outputs.influxdb_v2]] section, and fill in your URL, Token, and Bucket. It takes only 30 seconds for data to start flowing into your InfluxDB.

What is Telegraf and Why is it Widely Used?

Simply put, Telegraf is an open-source agent written in Go, part of the famous TICK stack ecosystem. Its biggest advantage is its extreme resource efficiency (using only about 10-50MB of RAM). With over 300 built-in plugins, it can read everything from MySQL and Redis metrics to MQTT messages from IoT devices.

Telegraf’s processing flow operates through four stages:

Inputs: Collect metrics (CPU, Docker, Nginx…).
Processors: Modify, tag, or filter data.
Aggregators: Group data, such as calculating a one-minute average.
Outputs: Data destinations (InfluxDB, Prometheus, Kafka…).

I use Telegraf + InfluxDB + Grafana to monitor 15 Linux servers. Thanks to this stack, I once detected a worker node running out of RAM before users even noticed the website was slowing down.

Configuring Telegraf for Real-World Data Collection

Don’t let the default thousands-of-lines configuration file discourage you. My experience is to clear it or comment everything out, keeping only what you truly need. Here is a sample configuration for monitoring system resources and Docker.

1. Monitoring System Resources

Insert the following code into the /etc/telegraf/telegraf.conf file:

[[inputs.cpu]]
  percpu = true
  totalcpu = true
  report_active = false

[[inputs.mem]]

[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "overlay", "squashfs"]

[[inputs.net]]
  interfaces = ["eth0", "enp*"]

2. Monitoring Docker Containers

Want to know which container is hogging the most resources? Use the docker plugin. Important note: the telegraf user must have access to the Docker socket.

# Grant permissions to Telegraf
sudo usermod -aG docker telegraf

# Configuration in telegraf.conf
[[inputs.docker]]
  endpoint = "unix:///var/run/docker.sock"
  container_names = []
  timeout = "5s"
  perdevice = false
  total = true

Pushing Data to InfluxDB

This is the final step for data storage. For InfluxDB 2.x, the configuration looks like this:

[[outputs.influxdb_v2]]
  urls = ["http://192.168.1.100:8086"]
  token = "YOUR_SECURE_TOKEN_HERE"
  organization = "my-org"
  bucket = "server-metrics"

A quick tip: Use environment variables to store your Token instead of hardcoding it. This helps secure your information if you accidentally push code to GitHub.

“Hard-earned” Operational Lessons

After years of working with monitoring systems, I’ve gathered three important tips to avoid data loss:

Always Test Before Applying

Whenever you edit the conf file, don’t rush to restart the service. Run a test command instead:

telegraf --config /etc/telegraf/telegraf.conf --test

If the terminal shows metrics streaming continuously, congratulations—your configuration is correct.

Adjusting the Collection Interval

By default, Telegraf collects data every 10 seconds. For less critical servers, I usually increase this to 30s or 60s. This reduces CPU load and significantly saves storage space in InfluxDB.

Handling Network Outages

If InfluxDB goes down or the network is unstable, Telegraf uses a buffer to hold data. You should increase metric_buffer_limit in the [agent] section to about 10,000.

I once lost two hours of data because the default buffer was too low during a fiber cut. Don’t let yourself end up in that same situation.

In short, Telegraf is an extremely flexible and powerful tool. Combined with Grafana for visualization, you’ll have full control over your infrastructure’s health. Good luck with your setup!