Using Debezium for Change Data Capture (CDC) from MySQL and PostgreSQL: Real-Time Sync to Kafka Without Touching Your Application – ITFROMZERO

Table of Contents

2 AM and the Classic Data Sync Problem

The system was running fine — then the data team came knocking: “We need to stream every database change to Kafka for real-time analytics.” Sounds simple enough, until you look at a legacy codebase with dozens of services writing to MySQL. Rewriting each service to emit events was simply not an option.

I ran into exactly this situation. No code changes allowed, no downtime permitted, but real-time sync was required. That’s when I found Debezium — and it handled the whole problem without touching a single line of application code.

What Debezium Is and Why It Doesn’t Need Code Changes

Debezium is an open-source CDC (Change Data Capture) platform that runs as a Kafka Connect connector. Instead of hooking into the application layer, it reads directly from the database’s transaction log:

MySQL: binlog
PostgreSQL: WAL (Write-Ahead Log) via logical replication

Whenever an INSERT, UPDATE, or DELETE occurs, Debezium captures that event and pushes it to the corresponding Kafka topic. The original application has no idea this is happening. No code changes required.

Debezium doesn’t just tell you “this row was modified” — it emits the full before/after state, giving you both the old and new values simultaneously. This is extremely useful for audit logs or data reconciliation. In practice, latency is typically under one second from the time a transaction commits to when the event appears on Kafka.

Hands-On: Setting Up Debezium + Kafka + MySQL with Docker Compose

Step 1: Prepare the Docker Compose File

Create a docker-compose.yml with all the required services:

version: '3.8'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    depends_on: [zookeeper]
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  mysql:
    image: mysql:8.0
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: rootpass
      MYSQL_DATABASE: inventory
      MYSQL_USER: debezium
      MYSQL_PASSWORD: dbzpass
    command: --server-id=1 --log-bin=mysql-bin --binlog-format=ROW --binlog-row-image=FULL

  connect:
    image: debezium/connect:2.5
    depends_on: [kafka, mysql]
    ports:
      - "8083:8083"
    environment:
      BOOTSTRAP_SERVERS: kafka:29092
      GROUP_ID: 1
      CONFIG_STORAGE_TOPIC: debezium_connect_configs
      OFFSET_STORAGE_TOPIC: debezium_connect_offsets
      STATUS_STORAGE_TOPIC: debezium_connect_statuses

docker compose up -d
# Wait about 30 seconds for services to start
docker compose ps

Step 2: Configure MySQL for Binlog

MySQL requires a user with sufficient privileges for Debezium to read the binlog. Connect to the MySQL container and grant the necessary permissions:

docker exec -it <mysql-container> mysql -uroot -prootpass

GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT
  ON *.* TO 'debezium'@'%';
FLUSH PRIVILEGES;

-- Verify binlog is enabled
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';

The output must show log_bin = ON and binlog_format = ROW. If the format is not ROW, Debezium will only receive SQL statements rather than full row data — the connector will run but events will be missing the before/after fields.

Step 3: Register the MySQL Connector

The Debezium connector is registered via the Kafka Connect REST API:

curl -X POST http://localhost:8083/connectors \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "mysql-connector",
    "config": {
      "connector.class": "io.debezium.connector.mysql.MySqlConnector",
      "database.hostname": "mysql",
      "database.port": "3306",
      "database.user": "debezium",
      "database.password": "dbzpass",
      "database.server.id": "184054",
      "topic.prefix": "myapp",
      "database.include.list": "inventory",
      "schema.history.internal.kafka.bootstrap.servers": "kafka:29092",
      "schema.history.internal.kafka.topic": "schema-changes.inventory"
    }
  }'

Verify the connector is running:

curl http://localhost:8083/connectors/mysql-connector/status
# "state": "RUNNING" means success

Step 4: Write Data and Observe Kafka Events

Create a table and insert data into MySQL:

USE inventory;
CREATE TABLE orders (
  id INT AUTO_INCREMENT PRIMARY KEY,
  product VARCHAR(100),
  quantity INT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO orders (product, quantity) VALUES ('Laptop', 2);
UPDATE orders SET quantity = 3 WHERE id = 1;
DELETE FROM orders WHERE id = 1;

Consume events from the Kafka topic:

docker exec -it <kafka-container> \
  kafka-console-consumer \
  --bootstrap-server localhost:29092 \
  --topic myapp.inventory.orders \
  --from-beginning

Each of the three operations above produces a separate JSON message. The op field indicates the operation type: c (create/insert), u (update), d (delete). Each message also includes before and after — for DELETE, after is null; for INSERT, before is null.

Configuring PostgreSQL CDC with Debezium

PostgreSQL requires one extra step compared to MySQL: enabling logical replication. In postgresql.conf:

wal_level = logical
max_replication_slots = 4
max_wal_senders = 4

If using Docker, pass the parameters directly via the command to avoid mounting a config file:

postgres:
  image: postgres:15
  command: postgres -c wal_level=logical -c max_replication_slots=4
  environment:
    POSTGRES_PASSWORD: pgpass
    POSTGRES_DB: mydb

Create a user with replication privileges:

CREATE USER debezium REPLICATION LOGIN PASSWORD 'dbzpass';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO debezium;

curl -X POST http://localhost:8083/connectors \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "postgres-connector",
    "config": {
      "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
      "database.hostname": "postgres",
      "database.port": "5432",
      "database.user": "debezium",
      "database.password": "dbzpass",
      "database.dbname": "mydb",
      "topic.prefix": "pgapp",
      "plugin.name": "pgoutput",
      "slot.name": "debezium_slot"
    }
  }'

Important warning for PostgreSQL: Debezium creates a replication slot on Postgres, and that slot holds WAL data until it is consumed. If Debezium is stopped for several hours on a write-heavy system, WAL can accumulate tens of gigabytes and fill up the disk. Always monitor replication slot lag:

SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots;

If the lag column exceeds a few gigabytes, investigate immediately — either Debezium is stuck or a downstream consumer is too slow.

Troubleshooting Common Errors

MySQL: “Access denied; you need the SUPER privilege”

This occurs when MySQL 8.0 requires the additional BACKUP_ADMIN privilege for consistent snapshots. Add it with:

GRANT BACKUP_ADMIN ON *.* TO 'debezium'@'%';

Connector Stuck in UNASSIGNED

This usually means Kafka Connect hasn’t finished starting up. Delete the connector and re-register it after 30 seconds:

curl -X DELETE http://localhost:8083/connectors/mysql-connector
# Wait, then POST again

Schema History Topic Missing

If Debezium reports it can’t read the schema history, check that the topic exists and set retention to unlimited for this topic — it’s the most critical topic in the setup, and losing it prevents the connector from starting:

kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics \
  --entity-name schema-changes.inventory \
  --alter --add-config retention.ms=-1

On a side note — I once needed to compare config data across environments, exported it to CSV, and then needed to convert it to JSON for a Python script. I used the converter at toolcraft.app/en/tools/data/csv-to-json — it runs entirely in the browser so there’s no risk of leaking production data, and it’s much faster than writing a one-off script.

Conclusion

Debezium solves the CDC problem in the least invasive way possible: no changes to application code, no database triggers — just reading what the database has already written to its transaction log. MySQL uses binlog ROW format; PostgreSQL uses logical replication with the pgoutput plugin.

Three things to keep in mind before going to production:

Monitor replication slot lag in PostgreSQL — this is the most common cause of disk fill-up; lag in the gigabyte range is a signal that needs immediate attention
Set the schema history topic retention to unlimited (retention.ms=-1)
Use a dedicated Kafka Connect cluster under high traffic — avoid sharing it with your application Kafka
Test your failover scenario: stop Debezium for an hour, bring it back up, and confirm it catches up correctly from its last checkpoint

Once you’re comfortable with this flow, CDC becomes the foundation for a wide range of use cases: event sourcing, data lake ingestion, cache invalidation, microservice synchronization — all without touching your legacy application. That’s the real value.