Deploying MariaDB Galera Cluster: A Multi-master High Availability Solution for Production

Database tutorial - IT technology blog
Database tutorial - IT technology blog

Why MariaDB Galera Cluster is the Top Choice for Production?

For system administrators, a 2 AM call reporting a database error is a nightmare. In the traditional Master-Slave model, when the Master fails, you must spend time switching to the Slave (failover). This process not only causes downtime but also carries the risk of data loss if replication is lagging.

MariaDB Galera Cluster completely changes the game with its Multi-master architecture. You can write data to any node, and it is instantly synchronized to the remaining nodes. If one server goes down, the application continues to function normally through other nodes without manual intervention. This is the fastest way to achieve true High Availability (HA).

Unlike asynchronous replication, Galera uses a synchronous mechanism. This ensures there is never data lag between servers. Once a transaction is committed, it is guaranteed to exist across the entire cluster.

Infrastructure and Resource Preparation

To avoid “Split-brain” (a condition where nodes lose communication and compete for leadership), always use an odd number of nodes. I recommend starting with at least 3 nodes. Suppose we have 3 Ubuntu 22.04 servers:

  • Node 1: 192.168.1.10
  • Node 2: 192.168.1.11
  • Node 3: 192.168.1.12

Step 1: Installing MariaDB

Install MariaDB server and the Galera library on all 3 servers using the following commands:

sudo apt update
sudo apt install -y mariadb-server mariadb-client galera-4

Next, secure the database using the default MariaDB script:

sudo mysql_secure_installation

Step 2: Firewall Configuration

Galera Cluster requires 4 distinct ports to maintain connectivity and data synchronization. If you are using UFW, open these ports immediately:

  • 3306: MySQL/MariaDB traffic.
  • 4567: Galera Cluster replication traffic.
  • 4568: Incremental State Transfer (IST).
  • 4444: State Snapshot Transfer (SST).
sudo ufw allow 3306,4567,4568,4444/tcp
sudo ufw allow 4567/udp

Configuring Galera Cluster

Stop the MariaDB service on all 3 nodes before modifying the configuration files:sudo systemctl stop mariadb

Create a new configuration file at /etc/mysql/conf.d/galera.cnf on Node 1. Use the following template:

[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Cluster Configuration
wsrep_cluster_name="prod_cluster_01"
wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12"

# Node Identity Configuration
wsrep_node_address="192.168.1.10"
wsrep_node_name="node1"
wsrep_sst_method=rsync

Copy this file to Node 2 and Node 3. Note: You must update wsrep_node_address and wsrep_node_name to match the IP and name of each respective server.

Pro tip: The rsync method is very stable for small databases. However, if your data exceeds 100GB, consider using mariabackup. It allows nodes to join the cluster without locking tables for too long.

Bootstrapping the Cluster

Do not start MariaDB normally on all machines. The cluster needs a seed point to start. On Node 1, run the initialization command:

sudo galera_new_cluster

Once Node 1 is ready, start MariaDB on Node 2 and Node 3 sequentially:

sudo systemctl start mariadb

These nodes will automatically connect to Node 1 based on the IP list in the configuration file to perform the initial data synchronization.

Verifying Operational Status

To confirm the system is running smoothly, access MariaDB and check the current number of nodes:

mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

If the result is 3, your system is ready. Try creating a database on Node 1 and check Node 3 to see the power of instant synchronization.

Data Processing Tips

When migrating from an old system, you often have to handle large amounts of data from CSV or dump files. To quickly normalize data structures before importing, I often use a CSV to JSON converter tool. This allows for data integrity checks directly in the browser, saving significant time compared to writing raw processing scripts.

Critical Notes to Avoid Downtime

  1. Quorum: If 2 out of 3 nodes fail simultaneously, the remaining node will automatically disconnect (Non-primary) to prevent data inconsistency. Therefore, prioritize an odd number of nodes like 3, 5, or 7.
  2. Synced Status: Always monitor the wsrep_local_state_comment metric. The ideal state must be Synced.
  3. Traffic Management: Even though it’s multi-master, writing to the same row from multiple nodes simultaneously can easily cause deadlocks. The best solution is to place ProxySQL or HAProxy in front to route write traffic to a single fixed node, while the other nodes serve as backups and handle read traffic.

Deploying Galera Cluster is not difficult; the challenge lies in the meticulousness of the network configuration. I hope this guide helps you feel more confident in building a resilient database infrastructure for your projects.

Share: