Greenboot on Fedora IoT and CoreOS: Automatic Health Checks and Rollback on Failed Updates – ITFROMZERO

Last month I updated the kernel on a Raspberry Pi running Fedora IoT — and the machine wouldn’t boot anymore. No monitor, no keyboard, just a small black box sitting in the corner of the room. I had to SSH in from another machine into rescue mode to fix it. That was the first time I truly understood why Greenboot exists.

Table of Contents

Background & Why You Need Greenboot

Fedora IoT and CoreOS both use an immutable filesystem based on rpm-ostree — each update creates a new “deployment” while the old one stays intact. In theory, if an update goes wrong, you just boot into the old deployment. But on a headless device with no monitor and no keyboard, who’s going to detect that failure and automatically roll back?

That’s what Greenboot is for.

At its core, Greenboot is a lightweight framework that integrates with systemd. It runs health check scripts immediately after each boot — if any script in the required.d directory fails, Greenboot marks the current boot as failed. After 3 consecutive failures, rpm-ostree automatically rolls back to the previous deployment, with no human intervention required.

I’ve been using Fedora as my primary development machine for 2 years. A failed update on a laptop is annoying, but fixable. With a headless device plugged into a corner of the room or a server sitting in a data center hundreds of kilometers away, the situation is entirely different. Greenboot is especially necessary when:

The device is remote with no physical access (VPS, edge device, Raspberry Pi)
The system requires high uptime — 30 minutes of downtime waiting for a manual fix is too costly
You’re running rpm-ostreed-automatic or zincati for unattended automatic updates

Installing Greenboot

On Fedora IoT and CoreOS, Greenboot is usually pre-installed. Quick check:

rpm -q greenboot
# greenboot-0.15.0-4.fc40.noarch

If it’s not installed, since the filesystem is immutable you need to use rpm-ostree instead of dnf:

# Fedora IoT / CoreOS
sudo rpm-ostree install greenboot greenboot-default-health-checks

# Reboot to apply
sudo systemctl reboot

On regular Fedora Workstation/Server:

sudo dnf install greenboot greenboot-default-health-checks

After installing, enable and start the 2 main services:

sudo systemctl enable --now greenboot-healthcheck.service
sudo systemctl enable --now greenboot-status.service

# Check status
sudo systemctl status greenboot-healthcheck.service

Detailed Configuration

Greenboot reads health check scripts from two main directories:

/etc/greenboot/check/required.d/ — Scripts that must pass. Failure marks the system as unhealthy
/etc/greenboot/check/wanted.d/ — Optional scripts. Failure only logs a warning and does not trigger rollback

Depending on the health check result, Greenboot also runs scripts from two hook directories:

/etc/greenboot/green.d/ — Runs when all checks pass
/etc/greenboot/red.d/ — Runs when a check fails, before rebooting to roll back

Network Connectivity Check Script

Exit code 0 = healthy, anything else = fail. This rule is simple but important to keep in mind — Greenboot doesn’t read output text, it only looks at the script’s exit code. I always start with a network check because it’s the most commonly affected thing after a kernel update:

sudo nano /etc/greenboot/check/required.d/01-check-network.sh

#!/bin/bash
# Check network connectivity after boot

TIMEOUT=30
TARGET="8.8.8.8"

echo "Checking network connectivity..."

for i in $(seq 1 $TIMEOUT); do
    if ping -c 1 -W 1 "$TARGET" &>/dev/null; then
        echo "Network OK after ${i}s"
        exit 0
    fi
    sleep 1
done

echo "ERROR: Network unreachable after ${TIMEOUT}s"
exit 1

sudo chmod +x /etc/greenboot/check/required.d/01-check-network.sh

The numeric prefix in the filename (01-, 10-, 99-) controls execution order. Put the network check at 01- because many other checks depend on the network — there’s no point testing an API endpoint if the network isn’t up yet.

Critical Service Check Script

sudo nano /etc/greenboot/check/required.d/10-check-services.sh

#!/bin/bash
# Check that required services are running

REQUIRED_SERVICES=("sshd" "NetworkManager")

for svc in "${REQUIRED_SERVICES[@]}"; do
    if ! systemctl is-active --quiet "$svc"; then
        echo "ERROR: Service $svc is not running"
        systemctl status "$svc" --no-pager
        exit 1
    fi
    echo "OK: $svc is active"
done

exit 0

sudo chmod +x /etc/greenboot/check/required.d/10-check-services.sh

Sending a Telegram Notification on Successful Boot

With remote devices, knowing exactly when the machine has finished booting is incredibly useful — no more manually opening a terminal and pinging. Hooks in green.d run after all health checks pass, making it the ideal place to send an “I’m alive” signal:

sudo nano /etc/greenboot/green.d/99-notify-success.sh

#!/bin/bash
HOSTNAME=$(hostname)
TOKEN="your-bot-token"
CHAT_ID="your-chat-id"

MESSAGE="✅ ${HOSTNAME} booted successfully at $(date '+%Y-%m-%d %H:%M')"

curl -s -X POST "https://api.telegram.org/bot${TOKEN}/sendMessage" \
    -d "chat_id=${CHAT_ID}&text=${MESSAGE}" &>/dev/null

exit 0

Sending an Alert Before Rollback

sudo nano /etc/greenboot/red.d/99-notify-failure.sh

#!/bin/bash
HOSTNAME=$(hostname)
TOKEN="your-bot-token"
CHAT_ID="your-chat-id"
BOOT_COUNTER=$(cat /run/greenboot/boot_counter 2>/dev/null || echo "unknown")

MESSAGE="⚠️ ${HOSTNAME}: Boot health check FAILED (attempt ${BOOT_COUNTER}/3). Preparing rollback..."

curl -s -X POST "https://api.telegram.org/bot${TOKEN}/sendMessage" \
    -d "chat_id=${CHAT_ID}&text=${MESSAGE}" &>/dev/null

exit 0

Testing & Monitoring

Running Health Checks Manually Without Rebooting

# Run all health checks immediately
sudo systemctl start greenboot-healthcheck.service

# View results
sudo systemctl status greenboot-healthcheck.service

# Detailed log for each script
sudo journalctl -u greenboot-healthcheck.service -n 50 --no-pager

Viewing Deployment Status and Boot Counter

# Current boot counter (increments on each failure, resets to 0 on pass)
sudo grub2-editenv - list | grep boot_counter

# View all deployments
rpm-ostree status

# Sample output:
# ● fedora:fedora/40/x86_64/iot
#                    Version: 40.20240601.0 (current)
#   fedora:fedora/40/x86_64/iot
#                    Version: 40.20240501.0 (rollback target)

Testing Rollback with a Simulated Failure Script

Before trusting Greenboot with a real device, I always test it manually. The quickest way is to create an intentionally failing script, reboot 3 times, and verify that the system rolls back correctly:

# Create an intentionally failing script
sudo bash -c 'cat > /etc/greenboot/check/required.d/99-test-failure.sh << EOF
#!/bin/bash
echo "Simulated failure for testing"
exit 1
EOF'
sudo chmod +x /etc/greenboot/check/required.d/99-test-failure.sh

# Reboot and observe
sudo systemctl reboot

After 3 consecutive failed boots, the system automatically rolls back to the old deployment. Remember to remove the test script when you’re done with sudo rm /etc/greenboot/check/required.d/99-test-failure.sh.

Integrating with Zincati on CoreOS

On CoreOS, Zincati handles downloading and applying updates automatically. Paired with Greenboot, you get a complete unattended loop: updates happen at 2 AM, and by morning either the machine is running the new version stably, or it has already rolled back to the old one.

sudo mkdir -p /etc/zincati/config.d
sudo nano /etc/zincati/config.d/55-updates-strategy.toml

[updates]
strategy = "periodic"

[[updates.periodic.window]]
days = [ "Mon", "Wed", "Fri" ]
start_time = "02:00"
length_minutes = 60

Monday, Wednesday, Friday at 2 AM — avoiding weekends so someone is available if anything goes wrong. If the update causes a failure, Greenboot detects it on the next boot and completes the rollback before sunrise.

Real-time Log Monitoring

# Follow all Greenboot service logs
sudo journalctl -fu "greenboot*"

# View only the last hour
sudo journalctl -u "greenboot*" --since "1 hour ago" --no-pager

After setting this up, I let Fedora IoT on the Raspberry Pi update itself for months without touching it. There was one kernel update that broke the WiFi driver — Greenboot caught it immediately (the network check script failed), rolled back automatically, and I got a Telegram notification early in the morning. No downtime, no more crawling into rescue mode like I used to.