How to Configure kdump on Linux: Collecting Kernel Crash Dumps and Analyzing Critical System Failures

Linux tutorial - IT technology blog
Linux tutorial - IT technology blog

Server crashes with no explanation — a sysadmin’s nightmare

I still remember the first time I dealt with a kernel panic in production — a CentOS 7 server running the team’s primary database, where I’d spent an entire month tuning it to achieve sub-10ms latency. It rebooted at 3 AM with no prior warning. Checking /var/log/messages showed the log cut off mid-stream — not a single line recording the cause. It took nearly a full week to reproduce the bug and track down the culprit: a kernel module from backup software, the last thing I would have ever suspected.

If I’d known about kdump back then, I would have saved an enormous amount of time. kdump captures the entire state of RAM at the moment the kernel crashes — before the system has a chance to reboot. That dump file can then be analyzed with the crash tool to pinpoint the exact call stack, the offending line of code, and the root cause of the incident.

How does kdump work?

Rather than relying on the crashing kernel (which is no longer trustworthy), kdump uses a small capture kernel that is pre-loaded into a reserved region of RAM at boot time. When the main kernel panics, the system switches to running this capture kernel, writes the entire contents of RAM to a vmcore file, and only then reboots normally.

  • vmcore: The dump file that gets written out, typically located at /var/crash/
  • makedumpfile: Compresses and filters the vmcore, discarding unnecessary RAM pages to save disk space
  • crash: The tool for analyzing vmcore files, with an interface similar to GDB
  • kernel-debuginfo: A package containing debug symbols — required for crash to resolve symbol names

Memory reservation

The capture kernel needs its own RAM region that the main kernel never touches. This region is declared via the crashkernel= boot parameter. Typically 128–256MB is sufficient for most systems.

Installing and configuring kdump step by step

Step 1: Install the required packages

On RHEL/CentOS/AlmaLinux:

sudo dnf install kexec-tools crash kernel-debuginfo kernel-debuginfo-common
# CentOS 7 uses yum
sudo yum install kexec-tools crash kernel-debuginfo kernel-debuginfo-common

On Ubuntu/Debian:

sudo apt-get install kdump-tools crash linux-image-$(uname -r)-dbgsym

Step 2: Add crashkernel to GRUB

This step determines whether kdump will work at all — the capture kernel needs its own RAM region, and crashkernel= is how you declare it to the bootloader. Open /etc/default/grub:

sudo vi /etc/default/grub

# Find the GRUB_CMDLINE_LINUX line and append crashkernel=auto at the end
GRUB_CMDLINE_LINUX="... crashkernel=auto"

# Update the grub config (BIOS)
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

# If using UEFI
sudo grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

Reboot, then verify the crashkernel region has been reserved:

cat /proc/iomem | grep -i crash
# Expected output:
# 2d000000-4cffffff : Crash kernel

Step 3: Configure /etc/kdump.conf

sudo vi /etc/kdump.conf

Basic configuration to save the dump to local disk:

# Path where vmcore will be saved
path /var/crash

# Action to take after the dump completes
default reboot

# Use makedumpfile to compress and filter out unnecessary pages
core_collector makedumpfile -l --message-level 1 -d 31

The -d 31 flag strips zero pages, cache pages, and other page types unrelated to the crash. The resulting dump file is typically only 10–20% of actual RAM size — a 16GB server coming down to around 2–3GB is perfectly normal.

To write the dump to an NFS share instead and avoid consuming local disk:

nfs 192.168.1.100:/exports/crash_dumps
path /

Step 4: Enable and start the kdump service

sudo systemctl enable kdump
sudo systemctl start kdump
sudo systemctl status kdump

When it starts successfully, the output looks like this:

● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled)
   Active: active (exited) since Wed 2026-06-25 10:30:00 JST

Step 5: Test kdump (on a test environment only!)

To confirm kdump is working correctly, you need to trigger a simulated kernel panic. The command below will crash the server immediately — only run this on a VM or test server:

# WARNING: The server will crash and reboot immediately
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

After the server comes back up, check the dump directory:

ls -lh /var/crash/
# drwxr-xr-x 2 root root 4096 Jun 25 10:35 2026-06-25-10:35

ls -lh /var/crash/2026-06-25-10:35/
# vmcore              -- the main dump file
# vmcore-dmesg.txt    -- kernel messages captured before the crash

Analyzing a kernel crash dump with the crash tool

Opening vmcore with crash

# Syntax: crash <vmlinux> <vmcore>
sudo crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux \
           /var/crash/2026-06-25-10:35/vmcore

Once inside the crash shell, these commands cover 90% of cases:

# Call stack at the time of the crash
crash> bt

# Kernel log leading up to the crash
crash> log

# List of running processes
crash> ps

# General crash overview
crash> sys

# Memory status
crash> kmem -i

# Exit
crash> quit

Reading the backtrace output

The output of bt looks like this:

crash> bt
PID: 0      TASK: ffffffff81c14480  CPU: 0   COMMAND: "swapper/0"
 #0 [ffffffff81c03d28] machine_kexec at ffffffff8105b6ab
 #1 [ffffffff81c03d78] __crash_kexec at ffffffff810f2d5a
 #2 [ffffffff81c03e48] crash_kexec at ffffffff810f2e50
 #3 [ffffffff81c03e60] oops_end at ffffffff8162efb8
 #4 [ffffffff81c03e88] die at ffffffff8101e97b
 #5 [ffffffff81c03eb8] do_general_protection at ffffffff8162e834

Read from the bottom up — the function at the bottom of the stack is where the problem originated. In my case, the backtrace pointed directly at a module from the backup software, something I hadn’t suspected for a moment before that.

Quickly extracting dmesg from vmcore

Don’t want to do a deep analysis? Grab the kernel messages from just before the crash quickly:

makedumpfile --dump-dmesg /var/crash/2026-06-25-10:35/vmcore dmesg.txt
cat dmesg.txt | tail -50

Usually the error line right before Kernel panic is enough to start your investigation — no need to dig deep into the crash shell.

Practical notes for deployment

  • Disk space: Leave enough room for at least 1–2 dump files. A 16GB RAM server, even after compression, still needs around 3–5GB free at /var/crash
  • kernel-debuginfo doesn’t need to be installed on production: This package is several GB in size. Just install it on your analysis machine and copy the vmcore there
  • Kernel versions must match exactly: The vmlinux (from debuginfo) must be the exact same version as the kernel that generated the vmcore. A version mismatch causes the crash tool to error out immediately
  • crashkernel=auto may not be enough on high-memory servers: Servers with 64GB+ RAM sometimes need a manual override like crashkernel=512M if the kdump service fails to start
  • Regular cleanup: Add a cron job to delete old dumps so you don’t fill up the disk if the server crashes repeatedly

Conclusion

kdump isn’t something you use every day. But when a server suffers a kernel panic and leaves no trace behind, it’s the only thing that can give you a real answer. Setup takes about 30 minutes — in exchange for not having to spend days guessing in the dark when an incident occurs.

Servers running custom kernel modules, obscure hardware drivers, or heavy database workloads especially need kdump. Enable it from the start, before an incident happens — because by the time you need it, it’s too late to turn it on.

Share: