Mastering Performance Co-Pilot (PCP): The ‘Black Box’ for Monitoring CentOS Stream 9 – ITFROMZERO

Table of Contents

Why do you need PCP instead of traditional tools?

Back when my company’s traffic exploded from 10,000 to 50,000 CCU, classic tools like top or htop started showing their limits. They only show you the ‘state’ of the system at that exact moment. Once, a database server kept freezing at exactly 2:00 AM, but by the time I woke up at 8:00 AM to check, everything was smooth again. At that point, without historical data to “dissect,” DevOps engineers are stuck forever fighting fires instead of preventing them.

Performance Co-Pilot (PCP) is the solution. On CentOS Stream 9, it acts like a flight data recorder (black box). PCP collects everything from CPU and RAM to deep metrics like context switches or interrupt latency within the kernel. It saves it all so you can “travel back in time” to investigate errors. Here is how I implemented it from scratch.

Quick start: Enable monitoring in 5 minutes

The great thing about PCP is that it doesn’t require complex configuration. With just 3 command steps, your system will start logging 24/7.

# Install PCP and system analysis tools
sudo dnf install pcp pcp-system-tools -y

# Enable the collector service (pmcd) and the logger (pmlogger)
sudo systemctl enable --now pmcd pmlogger

# Verify operational status
pcp status

When you see the line Performance Co-Pilot configuration is local, congratulations. The system is silently logging to /var/log/pcp/pmlogger/. By default, PCP automatically compresses and rotates logs daily to save disk space.

Breaking down the PCP ecosystem

Don’t think of PCP as a single piece of software. It is a powerful framework consisting of 3 main components:

PMDA (Agents): “Undercover agents” located in various corners like Nginx, MySQL, or the Kernel to collect metrics.
PMCD (Collector): The coordinating brain that gathers data from PMDAs and responds to user queries.
Client Tools: Commands like pmstat and pminfo that help you read and understand the data.

Exploring the Metric Treasury

PCP provides over 5,000 different metrics. To view metrics related to the hard drive, use pminfo:

pminfo disk.dev.read

If you want to track real-time fluctuations with higher detail than vmstat, try pmstat with a 2-second interval:

pmstat 2

Tracing past incidents: The core value

Suppose the server crashed at 2:00 PM yesterday. Instead of guessing, I use pmrep to extract data from that exact fateful moment.

1. Find the corresponding log file

Logs are stored at /var/log/pcp/pmlogger/HOSTNAME/ in the format YYYYMMDD.HH.MM. Choose the file covering the time period you suspect.

2. Reconstruct the scene

# Inspect CPU idle from 14:00 to 14:15 on May 10, 2026
pmrep -a /var/log/pcp/pmlogger/node-01/20260510.13.50 \
      -s "14:00" -t "14:15" kernel.all.cpu.idle

If cpu.idle drops to 0% while cpu.wait.total spikes, you know for sure the system is experiencing a disk I/O bottleneck, not slow code.

3. Exporting a report for the boss

Need to pull data into Excel for charting? Just add the parameter -o csv:

pmrep -a /var/log/pcp/pmlogger/node-01/20260510.13.50 -o csv > incident_report.csv

Real-world experience on CentOS Stream 9

After years of operation, I have 3 important tips for optimizing PCP:

Controlling log size

PCP logs in great detail. If the server has thousands of metrics, the log folder can swell to several GBs per week. Edit the /etc/pcp/pmlogger/control.d/local file to adjust the logging frequency. For critical servers, I usually set it to once every 10 seconds but only retain logs for the last 7 days.

Visualizing data on a Grafana dashboard

Typing commands all day can be tiring. You can install pmproxy to push data to Grafana via the **Performance Co-Pilot** plugin. This gives you highly intuitive real-time charts.

sudo systemctl enable --now pmproxy
sudo firewall-cmd --add-port=44322/tcp --permanent
sudo firewall-cmd --reload

Beware of SELinux

CentOS Stream 9 tightens security. If pcp status reports an error, there’s a high probability that SELinux is blocking your custom PMDAs. Check quickly with this command:

sudo ausearch -m avc -ts recent

Summary

Mastering PCP helps you escape the “guessing game” whenever the system slows down. Instead of asking “Who did what to make the server lag?”, you just pull up the logs and let the numbers speak for themselves. This tool might seem dry at first, but once you’re comfortable with the CLI, you’ll find it far more powerful than any web dashboard.