Traditional VPN starts breaking down as infrastructure grows
Managing a few dozen servers spread across multiple cloud providers — AWS in Virginia, Hetzner in Nuremberg, a handful of VPS instances in Singapore — and you’ll quickly realize that OpenVPN or WireGuard in a hub-and-spoke model just doesn’t cut it anymore. All traffic has to pass through a central server. Latency climbs. If that server goes down, the entire network goes dark.
Slack faced exactly this problem at the scale of thousands of nodes. They built and open-sourced Nebula — a mesh VPN overlay where each node connects directly peer-to-peer after authenticating through a lighthouse (a routing node that does discovery, not traffic relay). No more single point of failure. No more bottleneck.
I’ve been using Nebula to connect 3 VPS instances across 3 different datacenters and a dev laptop. Everything was up and running within 30 minutes of setup. Latency between two nodes in the same region is about 15ms lower compared to WireGuard through a relay.
How Nebula Works
Each node gets a virtual IP within a range you define (typically 192.168.100.0/24). Traffic between nodes is encrypted using the Noise Protocol Framework — the same foundation as WireGuard, but with a completely different architecture. Four things set it apart:
- Lighthouse: A node with a public IP that helps other nodes find each other (works like a STUN server). It does not relay traffic — only discovery.
- True peer-to-peer: After discovery, two nodes connect directly via UDP hole punching — even when both are behind NAT.
- Certificate-based auth: Each node has a certificate signed by your CA. No valid cert means no access to the network. Simple as that.
- Firewall embedded in config: Traffic control at the overlay layer — no separate iptables rules needed.
Installing Nebula
Requirements
- At least 1 server with a public IP (to act as lighthouse)
- Linux/macOS/Windows are all supported
- UDP port 4242 open on the lighthouse’s firewall
Download the binary
On each node (including the lighthouse), download the binary from GitHub releases:
# On Linux x86_64
wget https://github.com/slackhq/nebula/releases/latest/download/nebula-linux-amd64.tar.gz
tar -xzf nebula-linux-amd64.tar.gz
sudo mv nebula nebula-cert /usr/local/bin/
Create the CA and certificates for each node
This step runs once on the admin machine, then you distribute the certs to each node via scp. Important: ca.key never leaves this machine.
# Create the CA
nebula-cert ca -name "MyInfra CA"
# Outputs: ca.crt and ca.key — keep ca.key EXTREMELY secret
# Create cert for the lighthouse (virtual IP: 192.168.100.1)
nebula-cert sign -name "lighthouse" \
-ip "192.168.100.1/24" \
-ca-crt ca.crt -ca-key ca.key
# Outputs: lighthouse.crt, lighthouse.key
# Create cert for app server (virtual IP: 192.168.100.10)
nebula-cert sign -name "app-server" \
-ip "192.168.100.10/24" \
-groups "servers" \
-ca-crt ca.crt -ca-key ca.key
# Create cert for dev laptop (virtual IP: 192.168.100.50)
nebula-cert sign -name "dev-laptop" \
-ip "192.168.100.50/24" \
-groups "developers" \
-ca-crt ca.crt -ca-key ca.key
When I need to work out subnets for the overlay IP range, I often use toolcraft.app/en/tools/developer/ip-subnet-calculator — just enter a CIDR and it instantly shows the network range, broadcast address, and maximum host count. Much easier than calculating by hand.
Detailed Configuration
Lighthouse Config
Create /etc/nebula/config.yaml on the lighthouse server:
pki:
ca: /etc/nebula/ca.crt
cert: /etc/nebula/lighthouse.crt
key: /etc/nebula/lighthouse.key
lighthouse:
am_lighthouse: true
listen:
host: 0.0.0.0
port: 4242
punchy:
punch: true
logging:
level: info
firewall:
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: icmp
host: any
Config for regular nodes (app-server, dev-laptop)
pki:
ca: /etc/nebula/ca.crt
cert: /etc/nebula/app-server.crt
key: /etc/nebula/app-server.key
lighthouse:
am_lighthouse: false
interval: 60
hosts:
- "192.168.100.1" # Virtual IP of the lighthouse
static_host_map:
"192.168.100.1": ["203.0.113.10:4242"] # Real public IP of the lighthouse
listen:
host: 0.0.0.0
port: 4242
punchy:
punch: true
respond: true
logging:
level: info
firewall:
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: icmp
host: any
# Allow SSH from the developers group
- port: 22
proto: tcp
groups:
- developers
# Allow app traffic between servers
- port: 8080
proto: tcp
groups:
- servers
Deploy certs and do a test run
Copy the certs to each node and test before handing off to systemd:
# Copy to lighthouse
scp ca.crt lighthouse.crt lighthouse.key root@<lighthouse-ip>:/etc/nebula/
# Copy to app-server
scp ca.crt app-server.crt app-server.key root@<app-server-ip>:/etc/nebula/
# Run in the foreground to watch logs directly
sudo nebula -config /etc/nebula/config.yaml
When the log shows Handshake message sent and Handshake message received, the two nodes have successfully established a connection. Only then should you switch to systemd:
cat > /etc/systemd/system/nebula.service << 'EOF'
[Unit]
Description=Nebula VPN
After=network.target
[Service]
ExecStart=/usr/local/bin/nebula -config /etc/nebula/config.yaml
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now nebula
Testing Connectivity and Monitoring
Ping over the overlay network
Once Nebula is running on all nodes, verify using the virtual IPs:
# From app-server, ping dev-laptop over the Nebula overlay
ping 192.168.100.50
# Print the certificate info for the current node
nebula-cert print -path /etc/nebula/app-server.crt
Check peer status via Prometheus metrics
Nebula supports exposing metrics in Prometheus format. Add this to your config:
stats:
type: prometheus
listen: 127.0.0.1:8080
path: /metrics
namespace: nebula
subsystem: stats
interval: 10s
# Check active tunnel count and handshake success/failure stats
curl -s http://127.0.0.1:8080/metrics | grep -E "(tunnel|handshake)"
Debug firewall rules
# Enable debug log level to see whether packets are allowed or denied
# Edit config: logging.level: debug
# Restart and monitor
journalctl -u nebula -f | grep -E "(ALLOW|DENY|firewall)"
Test real-world throughput
Measure bandwidth between two nodes with iperf3 once the network has stabilized:
# On the receiver node (192.168.100.10)
iperf3 -s -B 192.168.100.10
# On the sender node — 4 parallel streams, 30-second test
iperf3 -c 192.168.100.10 -t 30 -P 4
Things to Keep in Mind for Real-World Operations
During the first few weeks in production, I ran into all sorts of issues. Documenting them here so you don’t have to waste time debugging the same things:
- Lighthouse redundancy: Use at least 2 lighthouses in 2 different datacenters. Just add the second lighthouse’s IP to
lighthouse.hostsandstatic_host_mapin each node’s config and you’re done. - Certificate expiry: Certs have no expiration by default — that sounds convenient but is actually dangerous if a cert is ever compromised. Always set
-duration 8760h(1 year) when signing certs and schedule regular rotation. - MTU overhead: Nebula adds about 60 bytes of header. App timing out or seeing unexplained packet loss? Try adding
tun: mtu: 1300to your config. - Revoking a node: Need to immediately kick a node off the network? Add its cert to
pki.blocklistin every node’s config and reload — Nebula will immediately refuse connections from that cert.
Compared to WireGuard, Nebula has a steeper initial setup. But having firewall rules embedded directly in each node’s config saves an enormous amount of access control overhead once you go past 10 nodes. No separate iptables, no manually syncing rules across servers. For growing infrastructure, this is a compelling replacement for the classic OpenVPN hub-and-spoke model.

