Python Profiling: Pro Tips for Troubleshooting Slow Code with cProfile and Py-Spy

Python tutorial - IT technology blog
Python tutorial - IT technology blog

Why is your code slow? Stop guessing, start measuring

This scenario is certainly familiar: You write a data processing script, test it with 10 records, and it runs smoothly. But when you throw it into production with 1 million records, the system suddenly freezes. Your natural reflex is to jump in and fix whatever code you “feel” is slow. I was the same when I first started. The result was usually fixing one thing and breaking another, while the speed remained stagnant.

Donald Knuth once said: “Premature optimization is the root of all evil.” Instead of guessing, we need Profiling. This is an analysis technique to determine exactly which function is taking the most time or which line of code is hogging the CPU.

Here are my two “go-to” tools: cProfile (built into Python) and Py-Spy (extremely powerful for running applications).

Setting Up Your Toolkit

With cProfile, you don’t need to install anything because it’s a standard library. However, to view results visually with charts, you should install snakeviz.

If you need to inspect running applications without interruption, Py-Spy is the top choice. It’s written in Rust, extremely lightweight, and causes almost zero overhead to the system.

# Install Py-Spy for process monitoring
pip install py-spy

# Install snakeviz for visual charts
pip install snakeviz

Note: On Linux or macOS, py-spy usually requires sudo privileges to inspect running processes.

Using cProfile to Inspect Every Function

cProfile is a deterministic profiler. It records every function call event with extreme detail. The only downside is that it can slow down your code by about 2-5 times because it has to log everything constantly.

Method 1: Run directly from the Command Line

Suppose you have a file named heavy_script.py. Instead of running it normally, use this command:

python -m cProfile -s cumulative heavy_script.py

The -s cumulative flag helps sort the results by the total execution time of the function and its sub-functions. Pay attention to these metrics:

  • ncalls: Number of times the function was called.
  • tottime: Total time spent in the given function (excluding time in sub-functions).
  • cumtime: Cumulative time spent in this and all sub-functions.

Method 2: Embed into code to test specific logic blocks

If you only want to test a complex logic function, use the Profile object directly in your code:

import cProfile
import pstats

def process_data():
    # Suppose this is heavy processing with 10 million elements
    return sum([i**2 for i in range(10000000)])

with cProfile.Profile() as pr:
    process_data()

stats = pstats.Stats(pr)
stats.sort_stats(pstats.SortKey.TIME).print_stats(10)

Once, while debugging a text processing script, I was convinced a regex error was causing an infinite loop. When I inspected it with cProfile, I was shocked to find the issue was opening/closing files too many times inside a loop. By the way, if you need to quickly test regex, I often use the regex tester at toolcraft.app to save time.

Py-Spy: The Solution for Running Applications (Production)

The weakness of cProfile is that you have to restart the script. But if your FastAPI API is running and CPU spikes to 100%, you can’t just stop it to debug. This is where Py-Spy shines.

Live monitoring like the ‘top’ command

You can directly inspect which function is hogging the CPU using the PID (Process ID):

# Find the PID of the python process
ps aux | grep python

# View live monitoring
py-spy top --pid 1234

The screen will display a list of functions consuming resources in real-time, similar to the operating system’s top command.

Generating a Flame Graph – Spot bottlenecks at a glance

A Flame Graph is the fastest way to find bottlenecks. The wider the block, the more execution time that function consumes.

py-spy record -o profile.svg --pid 1234

After running for about 30 seconds, press Ctrl+C. Open the profile.svg file in a browser, and you’ll see the full performance picture. I once used this graph to prove to my boss that the slowness was due to database queries missing indexes, not Python logic.

Post-Profiling: Best Practices for Optimization

Once you have the data, the next step is optimization. Here are 3 common mistakes I frequently encounter:

1. Using the wrong data structures

If you see a high ncalls and tottime in search functions, re-check your data structures. Switching from searching in a list ($O(n)$) to a set or dict ($O(1)$) can make your code hundreds of times faster.

2. Using the wrong libraries

Processing large arrays with pure Python lists is very slow. Consider using NumPy or Pandas. These libraries run on a C/C++ backend, offering performance that far exceeds standard Python loops.

3. Confusing I/O Bound and CPU Bound

If cumtime is high but tottime is extremely low, your code is likely waiting on I/O (API calls, DB reads). In this case, optimizing CPU algorithms is useless. You should switch to asyncio or increase the number of workers to utilize the waiting time.

Profiling isn’t a one-time task. Make it a habit whenever you add a major feature. Spending just 15 minutes measuring can save you hours of useless coding.

Share: