Configuring Seccomp Profiles for Linux and Docker Containers: Reducing Attack Surface by Filtering System Calls

Security tutorial - IT technology blog
Security tutorial - IT technology blog

My server was hit by an SSH brute-force attack once, and I had to deal with it in the middle of the night — after that, I started auditing my entire security stack, not just the firewall or fail2ban, but deeper layers at the kernel level. One of the things I found and immediately put into practice was Seccomp (Secure Computing Mode).

A lot of people run Docker containers with the default configuration without realizing those containers can still make hundreds of system calls to the host kernel. If a dangerous syscall like ptrace or mount gets exploited, an attacker can escalate privileges from inside the container out to the host. Seccomp profiles shut those syscalls down before they can be abused.

What Is Seccomp and Why Should You Care?

Simply put: Seccomp is a kernel-level filter that decides which syscalls a process is allowed to make — and blocks everything else. Linux 5.x has over 335 syscalls, but a typical web app only really needs around 50–70 of them. The rest — kexec_load, create_module, mount — are things no application needs, but attackers would love to be able to call.

Docker already ships with a default Seccomp profile that blocks around 44 of the most dangerous syscalls. That sounds like a lot, but 44 out of 335 still leaves over 290 open. For services handling sensitive data or directly exposed to the internet, you should write your own custom profile rather than relying on the default.

Two Seccomp Modes

  • SECCOMP_MODE_STRICT: Only allows read, write, exit, sigreturn — extremely restrictive, almost never used in practice
  • SECCOMP_MODE_FILTER: Uses BPF (Berkeley Packet Filter) to define flexible rules — this is what Docker and systemd use

Configuring a Seccomp Profile for Docker Containers

Creating a Custom Seccomp Profile

Profiles are written in JSON. The defaultAction field determines what happens to syscalls not on the list — typically SCMP_ACT_ERRNO (returns an EPERM error) or SCMP_ACT_KILL (kills the process immediately). Create the file custom-seccomp.json:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
  "syscalls": [
    {
      "names": [
        "accept", "accept4", "access", "arch_prctl",
        "bind", "brk", "capget", "capset",
        "chdir", "clock_gettime", "clone", "close",
        "connect", "dup", "dup2", "dup3",
        "epoll_create1", "epoll_ctl", "epoll_pwait", "epoll_wait",
        "execve", "exit", "exit_group",
        "fchmod", "fchown", "fcntl", "fstat", "fsync",
        "futex", "getcwd", "getdents64", "getegid",
        "geteuid", "getgid", "getpeername", "getpid",
        "getppid", "getrandom", "getrlimit", "getsockname",
        "getsockopt", "gettid", "gettimeofday", "getuid",
        "ioctl", "kill", "listen", "lseek", "lstat",
        "madvise", "mmap", "mprotect", "munmap",
        "nanosleep", "open", "openat", "pipe", "pipe2",
        "poll", "ppoll", "prctl", "pread64", "prlimit64",
        "pwrite64", "read", "readlink", "readv",
        "recvfrom", "recvmsg", "rename", "renameat2", "rmdir",
        "rt_sigaction", "rt_sigprocmask", "rt_sigreturn",
        "sched_yield", "select", "sendfile", "sendmsg", "sendto",
        "set_robust_list", "set_tid_address",
        "setgid", "setuid", "setsockopt",
        "sigaltstack", "socket", "stat", "statfs",
        "symlink", "tgkill", "unlink", "unlinkat",
        "uname", "wait4", "write", "writev"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Running a Container with a Custom Profile

# Run with custom Seccomp profile
docker run --security-opt seccomp=./custom-seccomp.json \
  -p 3000:3000 \
  my-nodejs-app

# Run unconfined — ONLY for debugging, never in production
docker run --security-opt seccomp=unconfined my-nodejs-app

Configuring in docker-compose.yml

version: '3.8'
services:
  webapp:
    image: my-nodejs-app
    security_opt:
      - seccomp:./custom-seccomp.json
    ports:
      - "3000:3000"

Debugging: Finding Out Which Syscalls Your App Actually Needs

The real question is: how do you know which syscalls your app needs so you don’t whitelist too few? Use strace to trace the app while it runs normally:

# Trace and filter a unique list of syscalls when the app starts
strace -f -o /tmp/syscalls.log node server.js
grep -oP '^[a-z_]+' /tmp/syscalls.log | sort -u

# Trace a running process (by specific PID)
strace -f -e trace=all -p 1234 2>&1 | awk -F'(' '{print $1}' | sort -u

Applied a profile but the app throws errors with no clear cause? Switch defaultAction to SCMP_ACT_LOG for testing — the kernel will log blocked syscalls instead of killing the process, making debugging much faster:

dmesg | grep -i seccomp
# or
journalctl -k | grep seccomp

Seccomp for Linux Services via systemd

Running your app directly on a Linux host without Docker? systemd has an even more convenient way to apply Seccomp — with built-in syscall groups organized by function category, so you don’t need to list every syscall individually:

# /etc/systemd/system/myapp.service
[Unit]
Description=My Web Application
After=network.target

[Service]
User=appuser
ExecStart=/usr/local/bin/myapp
Restart=on-failure

# Seccomp: only allow syscalls appropriate for a web service
SystemCallFilter=@system-service
SystemCallErrorNumber=EPERM

# Combine with additional hardening options
NoNewPrivileges=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectSystem=strict
ProtectHome=true

[Install]
WantedBy=multi-user.target

View the syscalls in each group and reload the service:

# View which syscalls belong to the @system-service group
systemd-analyze syscall-filter @system-service

# Useful groups include: @network-io, @file-system, @process, @io-event
systemd-analyze syscall-filter @network-io

# Reload and check for violations
systemctl daemon-reload
systemctl restart myapp
journalctl -u myapp | grep -i seccomp

Applying Seccomp Directly in Python Code

Want to embed Seccomp directly into your Python code, independent of Docker or systemd? The python-libseccomp library handles this — installation is straightforward:

pip install seccomp
import seccomp

# Default action: KILL if an unallowed syscall is called
filt = seccomp.SyscallFilter(defaction=seccomp.KILL)

allowed = [
    "read", "write", "open", "openat", "close",
    "stat", "fstat", "mmap", "mprotect", "munmap",
    "brk", "rt_sigaction", "rt_sigreturn",
    "ioctl", "pread64", "pwrite64",
    "socket", "connect", "accept", "bind", "listen",
    "send", "recv", "sendto", "recvfrom",
    "exit", "exit_group", "futex", "gettimeofday",
]

for syscall in allowed:
    try:
        filt.add_rule(seccomp.ALLOW, syscall)
    except Exception:
        pass

# Apply immediately — code below this point will have syscalls restricted
filt.load()
print("Seccomp filter applied")

Conclusion

Seccomp doesn’t replace a firewall or AppArmor — each security layer has its own role. Its strength lies in blocking at the lowest possible level: before a syscall ever reaches the kernel. Stack it with --cap-drop ALL, --read-only, and --security-opt no-new-privileges and you’ve got a container locked down with three independent layers.

Quick deployment checklist:

  • seccomp=unconfined on production is an absolute no-go — even for a temporary debugging session
  • Use strace or SCMP_ACT_LOG to profile actual syscalls before writing your whitelist — don’t guess
  • @system-service in systemd is a solid starting point — customize further based on each service’s needs
  • Test thoroughly after applying a profile, especially edge cases — some apps make unexpected syscalls when handling large files or dropped network connections

Ever since that night dealing with the brute-force incident, Seccomp profiles have been one of the first things I set up whenever I deploy a new service to production — especially for any container with direct internet exposure.

Share: