When Do You Actually Need strace?
Some Friday afternoons, a service just dies without logging anything. Restart it and it comes back up, then dies again five minutes later. dmesg shows nothing, journalctl is dead silent — and if you haven’t already mastered effective Linux debugging with journalctl and dmesg, those tools alone won’t get you far. That’s when I reach for strace.
On an old CentOS 7 server at work, I found myself using strace quite a bit to track down issues that normal log levels couldn’t capture — from file permissions going wrong after a deploy, to sockets being blocked by ulimit, to a binary stubbornly reading a config file from an old path even after symlinking it to the new location.
strace is a tool that lets you “eavesdrop” on every system call a process makes — meaning every interaction between the application and the kernel: opening files, reading/writing, network connections, spawning child processes, and so on. No source code needed, no recompilation — just attach to a running PID and you’re good.
Unlike reading logs (which only shows what the developer chose to print), strace shows you everything the process is actually doing at the kernel level.
Installing strace
Most distros ship strace in their official repositories:
# Ubuntu / Debian
sudo apt install strace
# CentOS / RHEL / AlmaLinux
sudo yum install strace
# or dnf for newer versions
sudo dnf install strace
# Arch Linux
sudo pacman -S strace
# Check version
strace --version
No additional configuration needed. strace works via the kernel’s ptrace() syscall — as long as you have sufficient permissions to attach to the process (typically requires root or the same user).
Practical Ways to Use strace
Run a Command Directly Through strace
The simplest approach — strace runs the command and prints every syscall:
strace ls /tmp
The output will be overwhelming. In practice, you’ll want to filter or write it to a file:
# Write to file for later analysis
strace -o /tmp/strace_ls.log ls /tmp
# Include timestamps (very useful)
strace -t -o /tmp/strace_ls.log ls /tmp
# Timestamps accurate to the microsecond
strace -tt -o /tmp/strace_ls.log ls /tmp
Attach to a Running Process
This is the most common real-world case — a service is running but misbehaving, and you can’t restart it. If the misbehaving process has gone fully silent, it’s also worth checking whether it has turned into a zombie process before attaching strace:
# Find the PID first
pgrep -a nginx
# or
ps aux | grep myapp
# Attach to the process
sudo strace -p 12345
# Attach to all threads (important for multi-threaded apps)
sudo strace -p 12345 -f
The -f flag (follow forks) is critical for multi-threaded apps or anything that spawns child processes — without -f you’ll miss all the syscalls from child threads.
Filter for Only the Syscalls You Care About
Use -e trace= to cut down on noise — this is what I use most often:
# Only show file-related syscalls
strace -e trace=file ls /tmp
# Only show network calls
strace -p 12345 -e trace=network
# Only show open, read, write
strace -e trace=open,read,write -p 12345
# Only show errors (syscalls returning -1)
strace -e trace=all -e status=failed -p 12345
The -e status=failed option is incredibly powerful — it only shows syscalls that failed, ignoring everything that succeeded. Use this to hunt down permission errors or missing files in seconds.
Measure Time per Syscall
# -T: show how long each syscall took
strace -T -p 12345 -e trace=file
# -c: aggregate statistics (very useful for bottleneck analysis)
strace -c ls /tmp
The output from -c looks like this:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
52.13 0.000423 42 10 mmap
23.41 0.000190 19 10 read
12.50 0.000102 12 8 1 openat
5.10 0.000041 5 8 fstat
...
The errors column shows which syscalls are failing. The % time column shows what’s consuming the most CPU time.
Debugging Real-World Problems
Case 1: Application Can’t Find Its Config File
A classic scenario: the app errors out but the log just vaguely says “config not found”.
# Filter file-related syscalls, only capture errors
strace -e trace=openat,open -e status=failed ./myapp 2>&1 | grep ENOENT
The output will point directly to the files the app is trying to read but can’t find:
openat(AT_FDCWD, "/etc/myapp/config.yaml", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/etc/myapp.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
Now you know exactly where the app is looking for its config, even if the code doesn’t document it clearly.
Case 2: Mysterious Permission Denied
strace -e trace=file -e status=failed sudo -u appuser ./myapp 2>&1 | grep EACCES
On that CentOS 7 server at work, I once ran into a service failing with “Permission denied” but ls -la showed the file permissions were clearly fine. It was only after running strace that I discovered SELinux was blocking it — the syscall returned EACCES but the error message didn’t mention SELinux at all. These subtle permission boundaries are also why it’s worth understanding Linux capabilities and fine-grained permissions — sometimes the issue isn’t file ownership at all, but a missing capability bit.
Case 3: Service Is Hanging — No Idea What It’s Waiting For
sudo strace -p $(pgrep myservice) -e trace=network,ipc
If you see the process stuck at:
epoll_wait(5, [], 1, 30000) = 0 # waiting on network, 30s timeout
# or
futex(0x7f..., FUTEX_WAIT, ...) # waiting on a lock
You know right away: the first one means it’s waiting on a network connection (possibly a backend timeout), the second means it’s waiting on a mutex lock (possibly a deadlock).
Case 4: Finding I/O Bottlenecks
# Collect aggregate stats over 30 seconds
sudo timeout 30 strace -c -f -p $(pgrep myapp) 2>&1
If you see read or write consuming more than 50% of time with a high usecs/call value — that’s a sign of slow I/O, and you should check your disk or NFS mount. At that point, pairing strace findings with a real-time view from iotop and htop helps confirm whether the bottleneck is process-level or system-wide.
Inspecting and Monitoring with strace
My Actual Debug Workflow
- Start with
-cfor a high-level overview: which syscalls are called most, which ones are failing. - Filter to specific syscalls based on what step 1 revealed — don’t look at raw output, it’s too noisy.
- Use
-tt -Twhen you need precise timing — look for unusually long gaps. - Grep by error code:
ENOENT(file not found),EACCES(permission denied),ECONNREFUSED(network),ETIMEDOUT(timeout).
Useful grep Commands When Analyzing strace Logs
# See all files that were successfully opened
grep 'openat.*O_RDONLY' strace.log | grep -v '= -1'
# Find network connections
grep 'connect(' strace.log
# Find writes to stderr (fd=2)
grep 'write(2,' strace.log
A Note on Overhead
strace uses ptrace() — every syscall requires pausing the process so the kernel can report it. Overhead can range from 2x to 10x depending on workload. Don’t use it on production under heavy load — only use it while actively debugging, or use -c for aggregate stats instead of watching real-time output.
If you need production tracing with lower overhead, perf trace or bpftrace are better choices — but strace is still the first tool I reach for because it requires zero setup and just works.
Wrapping Up
strace isn’t a tool you use every day, but when you need to debug problems that logs can’t explain — especially permission errors, missing files, network timeouts, or deadlocks — it saves a tremendous amount of time compared to reading source code or sprinkling in print statements and redeploying.
My recommended order when debugging with strace: start with -e status=failed to surface errors, then -c to spot bottlenecks, then drill into the details only if needed. Don’t start with raw output — you’ll get overwhelmed immediately.

