Keepalived Configuration Guide: Building Virtual IP and High Availability on Linux – ITFROMZERO

Have you ever felt anxious when a critical service on your server suddenly stopped working? Imagine a website with thousands of visitors abruptly showing a 500 error, or a payment system processing transactions being interrupted.

In modern digital business, even a few minutes of system downtime can cause significant damage to revenue and reputation. I’ve often worried about this issue, especially for services that need to be online 24/7. How can we ensure service availability, regardless of any incidents?

Table of Contents

The Real Problem: What if a Critical Service Suddenly Stops Working?

Imagine this: you’re managing an e-commerce website or a critical internal application for your company. Everything is running smoothly, but then one day, “out of nowhere,” the server encounters an issue. It could be due to hardware, software, or simply a network error. Customers can’t access it, employees can’t work, and pressure from the boss just keeps mounting. That feeling isn’t pleasant at all, is it?

For systems requiring High Availability (HA), the goal is to minimize downtime. A single server always carries the risk of being a Single Point of Failure (SPoF). If that server experiences issues, the entire service will grind to a halt. This is a scenario that I and many other IT professionals always want to avoid.

Root Cause Analysis: Why is a Single Server Risky?

An independent server, no matter how powerful, can still become the “Achilles’ heel” of the entire system. Common reasons for server downtime include:

Hardware Failure: Hard drive crash, faulty RAM, CPU overheating, or even a network card “dying.” These failures can occur at any time without warning.
Software/OS Errors: An incompatible update, a bug in an application, or an unexpected operating system issue can cause the server to become inoperable.
Network Issues: Broken connection, router/switch problems, or incorrect network configuration can make the server unreachable from the outside.
Human Error: Sometimes, just one incorrect configuration action by an administrator is enough to “bring down” the entire system.

When an incident occurs on the primary server, all traffic that should be directed to your service will be blocked. Customers will see the website as inaccessible, or the application as unresponsive.

Solutions: How to Keep Services “Alive”?

To counter single points of failure, the basic idea is to have redundancy. There are several approaches to achieve this:

Load Balancing: Distributes traffic across multiple servers, optimizing performance and increasing fault tolerance. If one server “dies,” the remaining servers can still handle requests. However, load balancing solutions themselves can become SPoFs if not designed with HA in mind.
Clustering: Uses multiple servers working together as a logical unit. Clustering solutions are more complex, often involving data sharing and state management.
Using Virtual IP (VIP): This is a very effective and widely adopted method. Instead of assigning individual IPs to each server, we use a “virtual” IP address (Virtual IP) to represent the service. This VIP will always belong to the active server and automatically switch to a standby server if the active one fails. Clients always access the VIP without needing to know which physical server is handling the request.

Among the solutions above, Virtual IP stands out as a simple yet extremely powerful way to achieve High Availability, especially for services that don’t require complex state sharing. And Keepalived is an excellent tool that helps us easily implement VIPs on Linux.

Keepalived: The Optimal Solution to Protect Your Services

When I need a lightweight, reliable HA solution for services like web servers, databases (with replication), or proxy servers, Keepalived is always the top choice. It helps me automatically deploy Virtual IPs, ensuring nearly uninterrupted service.

What is Keepalived?

Keepalived is an open-source software for Linux, designed to provide High Availability and load balancing features. However, its most prominent and widely used feature is the implementation of VRRP (Virtual Router Redundancy Protocol) to create Virtual IPs. Additionally, Keepalived can also monitor the status of services or applications on a server. If a service stops working, Keepalived can automatically switch the VIP to a standby server.

How Does VRRP Work?

VRRP is a standard protocol that allows a group of routers or servers to share a Virtual IP address. Within this group, one server will be elected as the Master, and the remaining servers will be Backups. The operating principle is simple:

Master: The Master server will hold the Virtual IP and be responsible for processing all packets sent to the VIP. It also periodically sends “advertisement” packets to the Backup servers to signal that it is still active.
Backup: The Backup servers listen for advertisement packets from the Master. If no advertisements are received within a certain period (usually a few seconds), the Backup server is entitled to automatically become the Master and take over the Virtual IP.
Failover: When the Master encounters an issue, the VIP will automatically switch to a Backup server. This process happens very quickly, usually within a few seconds, almost transparently to the end-user. When the original Master comes back online, it can reclaim the VIP (preemption) or remain a Backup, depending on the configuration.

With this mechanism, you only need to point applications or DNS records to the Virtual IP. Whether the Master or Backup is active, the service will always be accessible via the same IP address.

Keepalived Deployment: Detailed Guide

System Preparation

I will guide you through deploying Keepalived on two Linux servers. Here, I’m using Ubuntu Server, but the steps are similar for CentOS or other distributions.

Server 1 (Master): IP 192.168.1.10
Server 2 (Backup): IP 192.168.1.11
Virtual IP: 192.168.1.100 (This is the IP your service will use)

Ensure both servers can ping each other and belong to the same subnet. If you need to quickly calculate a subnet to see the network range, broadcast, or number of usable hosts, I often use toolcraft.app/en/tools/developer/ip-subnet-calculator. You just need to enter the CIDR, and it will display full information, which is extremely convenient.

Check the current IP configuration on your server:

ip a show eth0 # or your network card name
# Example output:
# 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether 00:0c:29:ab:cd:ef brd ff:ff:ff:ff:ff:ff
#     inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
#        valid_lft forever preferred_lft forever

Install Keepalived

The Keepalived installation process is quite simple. You just need to use your operating system’s package manager:

On Ubuntu/Debian:

sudo apt update
sudo apt install keepalived -y

On CentOS/RHEL/Fedora:

sudo yum install epel-release -y # EPEL repo is needed for older CentOS/RHEL
sudo yum install keepalived -y

Keepalived Configuration

The main Keepalived configuration file is located at /etc/keepalived/keepalived.conf. We will create two different configuration files for the Master and Backup servers.

Configuration for Server 1 (Master)

Open the file /etc/keepalived/keepalived.conf and paste the following content. Pay attention to the sections that need adjustment, such as the network interface name (eth0), Virtual IP (192.168.1.100), and authentication password.

global_defs {
    router_id LVS_DEVEL # Set a unique ID for this router
    # LVS_DEVEL can be replaced with the server's hostname.
}

vrrp_instance VI_1 {
    state MASTER             # Initial state is MASTER
    interface eth0           # Network interface name to which the VIP will be bound
    virtual_router_id 51     # Unique ID for the VRRP cluster (between 1-255). Must be the same on both Master and Backup.
    priority 101             # Master's priority (higher than Backup)
    advert_int 1             # Advertisement interval (seconds)
    authentication {
        auth_type PASS       # Authentication type
        auth_pass 1111       # Authentication password (must be the same on both servers)
    }
    virtual_ipaddress {
        192.168.1.100/24     # Your Virtual IP address
    }

    # Service health check (optional)
    # vrrp_script chk_httpd {
    #     script "killall -0 httpd" # Check if the httpd process is running
    #     interval 2 # Run script every 2 seconds
    #     weight -20 # If the script fails, decrease priority by 20
    # }
    # track_script {
    #     chk_httpd
    # }
}

Configuration for Server 2 (Backup)

Open the file /etc/keepalived/keepalived.conf on Server 2 and paste the following content. The main differences are that the state is BACKUP and the priority is lower than the Master’s.

global_defs {
    router_id LVS_BACKUP # Set a unique ID for this router
}

vrrp_instance VI_1 {
    state BACKUP             # Initial state is BACKUP
    interface eth0           # Network interface name
    virtual_router_id 51     # Unique ID for the VRRP cluster (must match Master)
    priority 100             # Backup's priority (lower than Master)
    advert_int 1             # Advertisement interval (seconds)
    authentication {
        auth_type PASS       # Authentication type
        auth_pass 1111       # Authentication password (must match Master)
    }
    virtual_ipaddress {
        192.168.1.100/24     # Your Virtual IP address
    }

    # Service health check (optional)
    # vrrp_script chk_httpd {
    #     script "killall -0 httpd" # Check if the httpd process is running
    #     interval 2 # Run script every 2 seconds
    #     weight -20 # If the script fails, decrease priority by 20
    # }
    # track_script {
    #     chk_httpd
    # }
}

Important Note: For Keepalived to assign a Virtual IP to an interface that is not the primary IP, you need to enable ip_nonlocal_bind. Additionally, enabling ip_forward is necessary for the server to forward packets, although it’s not always mandatory with just VIP.

Execute the following commands on both servers:

sudo sysctl net.ipv4.ip_nonlocal_bind=1
sudo sysctl net.ipv4.ip_forward=1
# To make these changes persistent after reboot:
echo "net.ipv4.ip_nonlocal_bind = 1" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p # Apply changes from sysctl.conf immediately

Start and Verify

After completing the configuration on both servers, start Keepalived and check its status.

On both servers:

sudo systemctl start keepalived
sudo systemctl enable keepalived # Ensure Keepalived starts automatically with the system
sudo systemctl status keepalived # Check service status

Now, check if the Virtual IP has been assigned to the network interface on the Master machine. You should see 192.168.1.100 appear on the Master’s eth0 interface:

ip a show eth0
# Example output on Master:
# 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
#     link/ether 00:0c:29:ab:cd:ef brd ff:ff:ff:ff:ff:ff
#     inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
#        valid_lft forever preferred_lft forever
#     inet 192.168.1.100/24 scope global secondary eth0 # <-- This is the VIP!
#        valid_lft forever preferred_lft forever

On the Backup machine, you will not see this VIP until the Master fails. You can monitor Keepalived logs to see it in action:

sudo journalctl -u keepalived -f # View real-time logs
# or
sudo tail -f /var/log/syslog # On some systems

From another machine on the same network, try pinging the Virtual IP 192.168.1.100. You should see it respond.

ping 192.168.1.100

Test Failover Functionality

This is the most crucial step to test your HA system. We will simulate a Master server failure.

On Server 1 (Master):

sudo systemctl stop keepalived # Stop the Keepalived service

Immediately, Server 2 (Backup) will detect that the Master is no longer advertising, automatically become the Master, and take over the Virtual IP. Check again with ip a show eth0 on Server 2; you will see 192.168.1.100 has appeared.

At the same time, the ping command from the client machine to 192.168.1.100 might lose a few packets (usually 1-3 packets) during the transition, but will then continue to respond normally.

When you restart Keepalived on Server 1 (Master), it will return to the Master state and reclaim the VIP (due to its higher priority, this is called preemption).

sudo systemctl start keepalived # Restart Master

Integrate Service Health Check

Keepalived not only monitors its own status but can also monitor other services (e.g., Nginx, Apache, Database) on the server. If that service stops working, Keepalived can automatically lower the server’s priority, forcing the VIP to switch to the Backup machine.

For example, to check the status of the Nginx service, you can create a small script:

sudo nano /etc/keepalived/check_nginx.sh

Paste the following content into the file:

#!/bin/bash
if systemctl is-active --quiet nginx; then
    exit 0 # Nginx is running, script successful
else
    exit 1 # Nginx is not running, script failed
fi

Grant execute permissions to the script:

sudo chmod +x /etc/keepalived/check_nginx.sh

Then, add the vrrp_script and track_script sections to the /etc/keepalived/keepalived.conf file on both servers:

vrrp_instance VI_1 {
    # ... other configurations ...

    track_script {
        chk_nginx # Name of the defined script
    }
}

vrrp_script chk_nginx {
    script "/etc/keepalived/check_nginx.sh"
    interval 2 # Run script every 2 seconds
    weight -20 # If the script returns an error (exit 1), decrease priority by 20 points
    # Example: if Master has priority 101, Nginx crashes -> priority becomes 81.
    # Backup with priority 100 will take over as Master.
}

Save the file and restart Keepalived on both servers: sudo systemctl restart keepalived. Now, if Nginx on the Master is stopped, the VIP will automatically switch to the Backup. This flexible mechanism ensures application availability, not just physical server availability.

Conclusion

Building highly available systems has become an essential requirement, no longer an option, for most online services. Keepalived offers a simple, effective, and reliable solution to deploy Virtual IP and HA on Linux. With Keepalived, you significantly reduce service downtime, protect your reputation, and ensure a seamless user experience. I believe that after this guide, you have enough knowledge and confidence to elevate your services to a new level of stability.