2 AM and the Classic Data Sync Problem
The system was running fine — then the data team came knocking: “We need to stream every database change to Kafka for real-time analytics.” Sounds simple enough, until you look at a legacy codebase with dozens of services writing to MySQL. Rewriting each service to emit events was simply not an option.
I ran into exactly this situation. No code changes allowed, no downtime permitted, but real-time sync was required. That’s when I found Debezium — and it handled the whole problem without touching a single line of application code.
What Debezium Is and Why It Doesn’t Need Code Changes
Debezium is an open-source CDC (Change Data Capture) platform that runs as a Kafka Connect connector. Instead of hooking into the application layer, it reads directly from the database’s transaction log:
- MySQL:
binlog - PostgreSQL:
WAL(Write-Ahead Log) via logical replication
Whenever an INSERT, UPDATE, or DELETE occurs, Debezium captures that event and pushes it to the corresponding Kafka topic. The original application has no idea this is happening. No code changes required.
Debezium doesn’t just tell you “this row was modified” — it emits the full before/after state, giving you both the old and new values simultaneously. This is extremely useful for audit logs or data reconciliation. In practice, latency is typically under one second from the time a transaction commits to when the event appears on Kafka.
Hands-On: Setting Up Debezium + Kafka + MySQL with Docker Compose
Step 1: Prepare the Docker Compose File
Create a docker-compose.yml with all the required services:
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:7.5.0
depends_on: [zookeeper]
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
mysql:
image: mysql:8.0
ports:
- "3306:3306"
environment:
MYSQL_ROOT_PASSWORD: rootpass
MYSQL_DATABASE: inventory
MYSQL_USER: debezium
MYSQL_PASSWORD: dbzpass
command: --server-id=1 --log-bin=mysql-bin --binlog-format=ROW --binlog-row-image=FULL
connect:
image: debezium/connect:2.5
depends_on: [kafka, mysql]
ports:
- "8083:8083"
environment:
BOOTSTRAP_SERVERS: kafka:29092
GROUP_ID: 1
CONFIG_STORAGE_TOPIC: debezium_connect_configs
OFFSET_STORAGE_TOPIC: debezium_connect_offsets
STATUS_STORAGE_TOPIC: debezium_connect_statuses
docker compose up -d
# Wait about 30 seconds for services to start
docker compose ps
Step 2: Configure MySQL for Binlog
MySQL requires a user with sufficient privileges for Debezium to read the binlog. Connect to the MySQL container and grant the necessary permissions:
docker exec -it <mysql-container> mysql -uroot -prootpass
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT
ON *.* TO 'debezium'@'%';
FLUSH PRIVILEGES;
-- Verify binlog is enabled
SHOW VARIABLES LIKE 'log_bin';
SHOW VARIABLES LIKE 'binlog_format';
The output must show log_bin = ON and binlog_format = ROW. If the format is not ROW, Debezium will only receive SQL statements rather than full row data — the connector will run but events will be missing the before/after fields.
Step 3: Register the MySQL Connector
The Debezium connector is registered via the Kafka Connect REST API:
curl -X POST http://localhost:8083/connectors \
-H 'Content-Type: application/json' \
-d '{
"name": "mysql-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbzpass",
"database.server.id": "184054",
"topic.prefix": "myapp",
"database.include.list": "inventory",
"schema.history.internal.kafka.bootstrap.servers": "kafka:29092",
"schema.history.internal.kafka.topic": "schema-changes.inventory"
}
}'
Verify the connector is running:
curl http://localhost:8083/connectors/mysql-connector/status
# "state": "RUNNING" means success
Step 4: Write Data and Observe Kafka Events
Create a table and insert data into MySQL:
USE inventory;
CREATE TABLE orders (
id INT AUTO_INCREMENT PRIMARY KEY,
product VARCHAR(100),
quantity INT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO orders (product, quantity) VALUES ('Laptop', 2);
UPDATE orders SET quantity = 3 WHERE id = 1;
DELETE FROM orders WHERE id = 1;
Consume events from the Kafka topic:
docker exec -it <kafka-container> \
kafka-console-consumer \
--bootstrap-server localhost:29092 \
--topic myapp.inventory.orders \
--from-beginning
Each of the three operations above produces a separate JSON message. The op field indicates the operation type: c (create/insert), u (update), d (delete). Each message also includes before and after — for DELETE, after is null; for INSERT, before is null.
Configuring PostgreSQL CDC with Debezium
PostgreSQL requires one extra step compared to MySQL: enabling logical replication. In postgresql.conf:
wal_level = logical
max_replication_slots = 4
max_wal_senders = 4
If using Docker, pass the parameters directly via the command to avoid mounting a config file:
postgres:
image: postgres:15
command: postgres -c wal_level=logical -c max_replication_slots=4
environment:
POSTGRES_PASSWORD: pgpass
POSTGRES_DB: mydb
Create a user with replication privileges:
CREATE USER debezium REPLICATION LOGIN PASSWORD 'dbzpass';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO debezium;
Register the PostgreSQL connector:
curl -X POST http://localhost:8083/connectors \
-H 'Content-Type: application/json' \
-d '{
"name": "postgres-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "debezium",
"database.password": "dbzpass",
"database.dbname": "mydb",
"topic.prefix": "pgapp",
"plugin.name": "pgoutput",
"slot.name": "debezium_slot"
}
}'
Important warning for PostgreSQL: Debezium creates a replication slot on Postgres, and that slot holds WAL data until it is consumed. If Debezium is stopped for several hours on a write-heavy system, WAL can accumulate tens of gigabytes and fill up the disk. Always monitor replication slot lag:
SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots;
If the lag column exceeds a few gigabytes, investigate immediately — either Debezium is stuck or a downstream consumer is too slow.
Troubleshooting Common Errors
MySQL: “Access denied; you need the SUPER privilege”
This occurs when MySQL 8.0 requires the additional BACKUP_ADMIN privilege for consistent snapshots. Add it with:
GRANT BACKUP_ADMIN ON *.* TO 'debezium'@'%';
Connector Stuck in UNASSIGNED
This usually means Kafka Connect hasn’t finished starting up. Delete the connector and re-register it after 30 seconds:
curl -X DELETE http://localhost:8083/connectors/mysql-connector
# Wait, then POST again
Schema History Topic Missing
If Debezium reports it can’t read the schema history, check that the topic exists and set retention to unlimited for this topic — it’s the most critical topic in the setup, and losing it prevents the connector from starting:
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics \
--entity-name schema-changes.inventory \
--alter --add-config retention.ms=-1
On a side note — I once needed to compare config data across environments, exported it to CSV, and then needed to convert it to JSON for a Python script. I used the converter at toolcraft.app/en/tools/data/csv-to-json — it runs entirely in the browser so there’s no risk of leaking production data, and it’s much faster than writing a one-off script.
Conclusion
Debezium solves the CDC problem in the least invasive way possible: no changes to application code, no database triggers — just reading what the database has already written to its transaction log. MySQL uses binlog ROW format; PostgreSQL uses logical replication with the pgoutput plugin.
Three things to keep in mind before going to production:
- Monitor replication slot lag in PostgreSQL — this is the most common cause of disk fill-up; lag in the gigabyte range is a signal that needs immediate attention
- Set the schema history topic retention to unlimited (
retention.ms=-1) - Use a dedicated Kafka Connect cluster under high traffic — avoid sharing it with your application Kafka
- Test your failover scenario: stop Debezium for an hour, bring it back up, and confirm it catches up correctly from its last checkpoint
Once you’re comfortable with this flow, CDC becomes the foundation for a wide range of use cases: event sourcing, data lake ingestion, cache invalidation, microservice synchronization — all without touching your legacy application. That’s the real value.

