From a Server Migration Story to Kubernetes Lessons
When CentOS 8 reached EOL in 2021, I had to urgently migrate 5 servers to Rocky Linux within a week. That experience didn’t just teach me about migration — it forced me to rethink my entire approach to infrastructure management. Running containers directly on individual servers, each with slightly different configurations, deploying through a pile of manual steps… That’s when I got serious about Kubernetes.
CentOS Stream 9 is the platform I chose for production — more stable than Fedora, upstream of RHEL, with official Red Hat support. EKS or GKE are more convenient, but they cost an extra ~$70-150/month just for the cluster fee. With kubeadm + Containerd + Calico, a self-managed cluster on a VPS is significantly cheaper while still being production-ready.
Containerd and Calico: Why This Combination
Starting from Kubernetes 1.24, Docker is no longer supported directly as a container runtime. Many people still get confused about this. Containerd is a runtime extracted from Docker — lighter, doesn’t drag in the Docker daemon, and is the default runtime for EKS, GKE, and AKS.
I chose Calico for its network policy support. Flannel is simpler to set up, but lacks policies — by default, every pod can communicate with every other pod. Calico lets you write rules like “service A can only call service B, direct database access is blocked” — essential when multiple teams are deploying on the same cluster. Its performance on bare-metal is also better than Flannel’s VXLAN overlay.
Environment Preparation
Minimum Requirements
- Control plane: 2 CPU, 2GB RAM (4GB recommended)
- Worker node: 1 CPU, 1GB RAM (2GB recommended)
- CentOS Stream 9, each node with a static IP
- Internet connection to pull images
This guide uses 3 nodes: 1 control plane + 2 workers. The steps below run on all nodes unless otherwise noted.
Disabling Swap and Configuring the Kernel
Kubernetes requires swap to be completely disabled. This step is the most commonly forgotten — and skipping it causes kubelet to restart repeatedly with vague errors that are very hard to debug:
# Disable swap immediately
swapoff -a
# Disable permanently — comment out the swap line in /etc/fstab
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# Enable required kernel modules
cat <<EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
# Configure sysctl
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
Properly Configuring the Firewall
Many guides tell you to just disable firewalld for simplicity — don’t do that on production. Opening only the necessary ports is both safer and avoids headaches down the road:
# On the control plane
firewall-cmd --permanent --add-port=6443/tcp # Kubernetes API
firewall-cmd --permanent --add-port=2379-2380/tcp # etcd
firewall-cmd --permanent --add-port=10250/tcp # Kubelet API
firewall-cmd --permanent --add-port=10259/tcp # kube-scheduler
firewall-cmd --permanent --add-port=10257/tcp # kube-controller-manager
# On worker nodes
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --permanent --add-port=30000-32767/tcp # NodePort Services
# Calico requires BGP and IP-in-IP — apply to all nodes
firewall-cmd --permanent --add-port=179/tcp
firewall-cmd --permanent --add-protocol=ipip
firewall-cmd --reload
Installing Containerd
# Add Docker repo (Containerd is included in this repo)
dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# Install Containerd
dnf install -y containerd.io
# Generate default config
mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml
# Enable SystemdCgroup — required for Kubernetes
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
# Start
systemctl enable --now containerd
The SystemdCgroup = true step is commonly skipped. If left as false, kubelet and containerd will use two different cgroup drivers. The cluster will still start and pods will still run — but resource limits won’t work correctly. The bug only surfaces under high load, when the OOM killer fires in unpredictable ways.
Installing kubeadm, kubelet, and kubectl
# Add Kubernetes repo
cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF
# Set SELinux permissive
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
# Install
dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
# Enable kubelet
systemctl enable --now kubelet
On SELinux: permissive instead of disabled so violations still get logged — you’ll know what policies need to be written later. Disabling it entirely is faster, but for long-term production use, investing in proper policies is the better path.
Initializing the Control Plane
Run this step on the control plane node only:
kubeadm init \
--pod-network-cidr=192.168.0.0/16 \
--control-plane-endpoint="10.0.0.10:6443" \
--cri-socket=unix:///run/containerd/containerd.sock
Flag explanations:
--pod-network-cidr=192.168.0.0/16: Calico uses this range by default — it must match exactly--control-plane-endpoint: IP of the control plane. If you plan to set up HA with multiple control planes later, use the load balancer IP here--cri-socket: Explicitly specifies the Containerd socket, avoiding confusion if multiple runtimes are installed on the machine
When it finishes, the end of the output will contain a kubeadm join command — copy it immediately, you’ll need it to join worker nodes. Then configure kubectl:
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
Installing Calico CNI
# Install Calico operator
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/tigera-operator.yaml
# Install Calico custom resources
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/custom-resources.yaml
# Wait until all Calico pods are Running
watch kubectl get pods -n calico-system
Wait about 2-3 minutes. Once all pods are Running, the control plane node will transition from NotReady to Ready:
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# cp-node Ready control-plane 5m v1.30.x
Joining Worker Nodes to the Cluster
Run the join command on each worker node — this command comes from the output of kubeadm init:
kubeadm join 10.0.0.10:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
The default token expires after 24 hours. If you’re joining late, generate a new one on the control plane:
kubeadm token create --print-join-command
Production Tips After the Cluster Is Running
Basic Cluster Verification
# Check all nodes
kubectl get nodes -o wide
# Check all system pods
kubectl get pods -A
# Quick test deployment
kubectl create deployment nginx-test --image=nginx --replicas=3
kubectl get pods -o wide # Verify pods are evenly distributed across nodes
Backing Up etcd — Don’t Skip This
etcd stores the entire state of your cluster: deployments, secrets, configmaps, certificates. Losing this snapshot is unrecoverable — not “losing some things,” but losing everything. I set up a daily backup cronjob:
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /backup/etcd-$(date +%Y%m%d).db
Draining Nodes Before Maintenance
Always drain a node before rebooting or updating the OS. Kubernetes will reschedule the pods to other nodes before shutdown — no pods are dropped, users see nothing:
# Before maintenance
kubectl drain worker-1 --ignore-daemonsets --delete-emptydir-data
# After maintenance is complete
kubectl uncordon worker-1
Always Set Resource Limits
Without requests/limits, the scheduler has no way to sensibly place pods on nodes. The most common outcome: one pod consumes all the memory on a node, the OOM killer fires, and takes other pods down with it. Minimum template for every Deployment:
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
Conclusion
Setting up a cluster for the first time takes about 1-2 hours — mostly waiting for image pulls and Calico to initialize. It gets much faster the next time, especially once you’ve scripted the repetitive steps.
Understanding what each step does matters more than setup speed. Why swap must be disabled, what SystemdCgroup affects, why Calico needs its own pod CIDR — if these aren’t clear, when the cluster has problems (and it will), your only option is to destroy and rebuild instead of being able to debug.
Three things to install right away to make the cluster truly usable: metrics-server (to enable kubectl top), Nginx Ingress Controller (to expose services via domain names), and a StorageClass if your workloads need persistent storage.

