Hunting Down CPU-Hogging “Culprits” on Linux: Mastering perf and Flame Graphs

Linux tutorial - IT technology blog
Linux tutorial - IT technology blog

System admins and developers are no strangers to the sight of a server suddenly spiking to 100% CPU. You open top or htop and see a process consuming resources. But the core question is: Specifically, which function in the code is running slowly? At this point, top is completely helpless.

This is where perf shines. As a tool built directly into the Linux kernel, perf allows you to peer into the heart of an application without disrupting the system. In this article, I will guide you through “diagnosingperformance issues, from terminal commands to intuitive Flame Graphs.

Installing perf in Seconds

To get started, you need the linux-tools package corresponding to your running kernel version. Installation is quite simple on popular distributions.

# On Ubuntu/Debian
sudo apt update
sudo apt install linux-tools-common linux-tools-generic linux-tools-$(uname -r)

# On RHEL/CentOS/AlmaLinux
sudo yum install perf

After installation, try running a system recording command for 5 seconds to verify:

sudo perf record -a -g sleep 5

In this command, -a samples all CPUs and -g records the call-graph. Results are saved in the perf.data file. To view the text results immediately, simply type:

sudo perf report

The command-line interface will appear. You can press Enter on each line to see the details of which function is consuming what percentage of the CPU.

Sampling Mechanism: The Secret Behind perf’s Speed

Unlike strace, which often slows down applications by 20-30%, perf works based on a sampling mechanism. At extremely short intervals, it takes a “snapshot” of the CPU. This approach keeps overhead extremely low, usually under 1%.

I once handled a tricky case on a Java server. The app was bottlenecked, but top only gave generic reports. Using perf, I discovered an old JSON library performing too many redundant operations. After upgrading the library and re-optimizing, the CPU load dropped from 85% to 20%, saving the company significant infrastructure costs.

Visualizing Data with Flame Graphs

Reading text on the terminal can be exhausting if the application has thousands of calling functions. Flame Graphs transform dry numbers into intuitive visuals. Looking at the chart, you can immediately see where the “fire” is.

Step 1: Collect Data for a Specific Process

Suppose you want to inspect a process with PID 1234 for 30 seconds:

sudo perf record -F 99 -p 1234 -g -- sleep 30

Note: -F 99 means sampling 99 times/second. The number 99 helps avoid synchronization with system timers that typically run at round frequencies like 100Hz.

Step 2: Prepare the Charting Tools

We use the script suite by Brendan Gregg, a leading performance expert at Netflix:

git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph

Step 3: Export the SVG File

Use the following sequence of commands to convert raw data into an interactive chart:

# Parse data to text format
sudo perf script > out.perf

# Fold stack traces
./stackcollapse-perf.pl out.perf > out.folded

# Generate Flame Graph
./flamegraph.pl out.folded > performance_map.svg

Now, open the performance_map.svg file with Chrome or Firefox to view the results.

Reading a Flame Graph Like a Pro

Don’t let the colors fool you; the rules for reading this chart are very consistent:

  • X-Axis (Horizontal): The width of a bar corresponds to the total CPU time. The wider the bar, the more resources that function consumes.
  • Y-Axis (Vertical): Shows the Call Stack (parent function calling child function). The higher you go, the deeper the stack.
  • Plateaus: Look for wide bars at the very top. These are the functions directly consuming the most CPU.

Important Notes to Avoid Common Pitfalls

Using perf is highly effective, but keep these 3 points in mind to ensure data accuracy:

  1. Debug Symbols: When compiling C++ or Go code, you must add the -g flag. Otherwise, perf will only display meaningless Hex addresses instead of function names.
  2. Sampling Frequency: Don’t be too ambitious and set -F too high on an overloaded server. 99Hz or 997Hz is the ideal sweet spot between detail and safety.
  3. Frame Pointer: Some modern compilers omit the frame pointer for optimization. If the Flame Graph looks fragmented and illogical, try recompiling with the -fno-omit-frame-pointer flag.

In summary, perf and Flame Graphs are a powerful “weapon” duo that every Linux engineer should have in their toolkit. Instead of guessing, use data to optimize your code scientifically. Good luck finding those lines of code silently eating up your resources!

Share: