System admins and developers are no strangers to the sight of a server suddenly spiking to 100% CPU. You open top or htop and see a process consuming resources. But the core question is: Specifically, which function in the code is running slowly? At this point, top is completely helpless.
This is where perf shines. As a tool built directly into the Linux kernel, perf allows you to peer into the heart of an application without disrupting the system. In this article, I will guide you through “diagnosing” performance issues, from terminal commands to intuitive Flame Graphs.
Installing perf in Seconds
To get started, you need the linux-tools package corresponding to your running kernel version. Installation is quite simple on popular distributions.
# On Ubuntu/Debian
sudo apt update
sudo apt install linux-tools-common linux-tools-generic linux-tools-$(uname -r)
# On RHEL/CentOS/AlmaLinux
sudo yum install perf
After installation, try running a system recording command for 5 seconds to verify:
sudo perf record -a -g sleep 5
In this command, -a samples all CPUs and -g records the call-graph. Results are saved in the perf.data file. To view the text results immediately, simply type:
sudo perf report
The command-line interface will appear. You can press Enter on each line to see the details of which function is consuming what percentage of the CPU.
Sampling Mechanism: The Secret Behind perf’s Speed
Unlike strace, which often slows down applications by 20-30%, perf works based on a sampling mechanism. At extremely short intervals, it takes a “snapshot” of the CPU. This approach keeps overhead extremely low, usually under 1%.
I once handled a tricky case on a Java server. The app was bottlenecked, but top only gave generic reports. Using perf, I discovered an old JSON library performing too many redundant operations. After upgrading the library and re-optimizing, the CPU load dropped from 85% to 20%, saving the company significant infrastructure costs.
Visualizing Data with Flame Graphs
Reading text on the terminal can be exhausting if the application has thousands of calling functions. Flame Graphs transform dry numbers into intuitive visuals. Looking at the chart, you can immediately see where the “fire” is.
Step 1: Collect Data for a Specific Process
Suppose you want to inspect a process with PID 1234 for 30 seconds:
sudo perf record -F 99 -p 1234 -g -- sleep 30
Note: -F 99 means sampling 99 times/second. The number 99 helps avoid synchronization with system timers that typically run at round frequencies like 100Hz.
Step 2: Prepare the Charting Tools
We use the script suite by Brendan Gregg, a leading performance expert at Netflix:
git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph
Step 3: Export the SVG File
Use the following sequence of commands to convert raw data into an interactive chart:
# Parse data to text format
sudo perf script > out.perf
# Fold stack traces
./stackcollapse-perf.pl out.perf > out.folded
# Generate Flame Graph
./flamegraph.pl out.folded > performance_map.svg
Now, open the performance_map.svg file with Chrome or Firefox to view the results.
Reading a Flame Graph Like a Pro
Don’t let the colors fool you; the rules for reading this chart are very consistent:
- X-Axis (Horizontal): The width of a bar corresponds to the total CPU time. The wider the bar, the more resources that function consumes.
- Y-Axis (Vertical): Shows the Call Stack (parent function calling child function). The higher you go, the deeper the stack.
- Plateaus: Look for wide bars at the very top. These are the functions directly consuming the most CPU.
Important Notes to Avoid Common Pitfalls
Using perf is highly effective, but keep these 3 points in mind to ensure data accuracy:
- Debug Symbols: When compiling C++ or Go code, you must add the
-gflag. Otherwise,perfwill only display meaningless Hex addresses instead of function names. - Sampling Frequency: Don’t be too ambitious and set
-Ftoo high on an overloaded server. 99Hz or 997Hz is the ideal sweet spot between detail and safety. - Frame Pointer: Some modern compilers omit the frame pointer for optimization. If the Flame Graph looks fragmented and illogical, try recompiling with the
-fno-omit-frame-pointerflag.
In summary, perf and Flame Graphs are a powerful “weapon” duo that every Linux engineer should have in their toolkit. Instead of guessing, use data to optimize your code scientifically. Good luck finding those lines of code silently eating up your resources!
