How to Install and Use gVisor to Secure Containers: Running Docker with an Isolated Kernel

Virtualization tutorial - IT technology blog
Virtualization tutorial - IT technology blog

Containers run directly on the host kernel — and that’s a problem

If you’re using Docker and have never considered that a container could “escape” and attack the host, this article is for you.

By nature, Linux containers share the kernel with the host. Unlike VMs with a hypervisor providing complete separation, containers rely only on namespaces and cgroups for isolation. That means: if an attacker exploits a dangerous syscall inside a container, they can escalate privileges out to the host.

This type of attack is called Container Escape. Well-known CVEs like runc CVE-2019-5736 and Dirty Pipe CVE-2022-0847 both follow this mechanism — exploiting the kernel from inside a container.

I run a homelab with Proxmox VE managing 12 VMs and containers — it’s my playground for testing everything before pushing to production. After reading a post-mortem about a container escape on another company’s production Kubernetes cluster, I started taking hardening more seriously instead of just using --read-only or dropping capabilities.

The solution I found: gVisor.

What is gVisor and how it differs from conventional isolation

gVisor is a sandbox runtime for containers, developed and open-sourced by Google. Instead of letting containers make syscalls directly to the host kernel, gVisor places an intermediary layer called Sentry in between.

Sentry is a kernel written in Go that runs in user space. When an app inside a container calls open(), read(), or execve(), Sentry intercepts those syscalls and handles them within the sandbox. Only what is truly necessary gets passed down to the host kernel — through a very restricted set of syscalls.

Think of it like this:

  • Regular Docker: App → syscall → host Linux kernel (direct)
  • gVisor: App → syscall → Sentry (virtual kernel in user space) → a few safe syscalls → host Linux kernel

The result: the host kernel’s attack surface is dramatically reduced. Even if an attacker exploits a vulnerability in a container app, they can only escape into Sentry — not the real host kernel.

gVisor supports two platforms:

  • ptrace: Works anywhere, but slower
  • KVM: Requires CPU virtualization support, significantly faster (this is what I use in my homelab) — if you haven’t set up KVM on Ubuntu yet, it’s worth doing before enabling this platform

Installing gVisor on Ubuntu/Debian

Step 1: Add the repository and install gVisor

# Add gVisor's official GPG key and repository
curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" \
  | sudo tee /etc/apt/sources.list.d/gvisor.list

sudo apt-get update && sudo apt-get install -y runsc

After installation, verify the version:

runsc --version
# runsc version release-20240401.0

Step 2: Configure Docker to use the gVisor runtime

Open or create the file /etc/docker/daemon.json:

sudo nano /etc/docker/daemon.json

Add the following content (if the file already has content, merge it in — don’t overwrite):

{
  "runtimes": {
    "runsc": {
      "path": "/usr/bin/runsc"
    }
  }
}

Restart Docker to apply the changes:

sudo systemctl restart docker

Confirm the runtime has been recognized:

docker info | grep -i runtime
# Runtimes: io.containerd.runc.v2 runsc runc

Step 3: Run a container with gVisor

Simply add the --runtime=runsc flag to your normal Docker command:

# Regular container (uses runc, host kernel)
docker run --rm ubuntu uname -r

# Container with gVisor (uses runsc, Sentry kernel)
docker run --rm --runtime=runsc ubuntu uname -r

The interesting part: uname -r inside a gVisor container returns a completely different kernel version than the host — that’s Sentry’s kernel, not your machine’s actual kernel.

# Example output
# Host kernel:    6.8.0-87-generic
# gVisor kernel:  4.4.0
# (Sentry emulates an older kernel version for compatibility)

Step 4: Use gVisor with Docker Compose

In your docker-compose.yml, add runtime to the services you want to protect:

version: '3.8'
services:
  webapp:
    image: nginx:alpine
    runtime: runsc
    ports:
      - "8080:80"

  database:
    image: postgres:15
    runtime: runsc
    environment:
      POSTGRES_PASSWORD: secret
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Run as usual:

docker compose up -d

Step 5: Set gVisor as the default runtime (optional)

Want every container to go through gVisor unless specified otherwise? Update daemon.json:

{
  "default-runtime": "runsc",
  "runtimes": {
    "runsc": {
      "path": "/usr/bin/runsc"
    },
    "runc": {
      "path": "/usr/bin/runc"
    }
  }
}

When you need the original runtime, just specify --runtime=runc.

Verifying gVisor actually provides isolation

This is the first thing I test after setup — both to confirm the sandbox is working and to see the difference firsthand:

# Check /proc/self/status inside a gVisor container
docker run --rm --runtime=runsc ubuntu cat /proc/self/status
# CapEff will differ from a regular container

# Read kernel information
docker run --rm --runtime=runsc ubuntu cat /proc/version
# Linux version 4.4.0 (#1 SMP ...) — this is Sentry, not the real kernel

# Check the sandbox hostname
docker run --rm --runtime=runsc ubuntu hostname
# Each container has its own isolated sandbox

One more test for syscall restrictions — try calling ptrace (a syscall commonly abused in exploit techniques):

docker run --rm --runtime=runsc ubuntu bash -c \
  'strace -e trace=ptrace ls 2>&1 | head -5'
# You'll see an error or a blocked syscall — this is the expected behavior

Performance considerations and limitations

gVisor is not a silver bullet. Before deploying to production, there are a few trade-offs to understand:

  • Syscall overhead: Every syscall must pass through Sentry, so latency is higher than plain runc. Real-world benchmarks show I/O-heavy workloads (databases with continuous writes, file processing) can be 20–40% slower, while CPU-bound workloads (compression, encryption) typically add only 2–5%.
  • Compatibility: Sentry does not implement 100% of Linux syscalls. Apps using less common syscalls or newer kernel features may not run — test thoroughly before pushing to production.
  • Volume mounts: I/O with bind mounts is slower than regular containers. Where possible, use named volumes or tmpfs for directories requiring high throughput.
  • Not a VM: gVisor is still much lighter than a VM, with millisecond start times. But if you need full hardware-level isolation, a VM with KVM is the right tool.

In my homelab, I use gVisor for containers running untrusted code (CI runners receiving external code) and public-facing services. Internal databases still run on plain runc because of the high write frequency — the 20–40% overhead there isn’t worth the trade-off.

Conclusion

After a few months of running gVisor in my homelab, I think it’s a worthwhile security layer to add to your stack if you’re concerned about container escape. No need to touch your Dockerfile or build pipeline — just --runtime=runsc and your container is running in a sandbox with its own kernel.

Quick recap:

  • gVisor intercepts syscalls via Sentry — the host kernel is never directly exposed
  • Installation takes just 5 minutes with no image or Dockerfile changes required
  • Best suited for untrusted workloads, public-facing services, and CI/CD environments
  • There are performance trade-offs — I/O-heavy apps should be benchmarked first, not deployed blindly

If you’re using Kubernetes, gVisor also supports it via RuntimeClass — apply it per-pod without affecting the entire cluster. That’s the next step if your infrastructure has reached the orchestration layer.

Share: