Automating File Monitoring with Python Watchdog: Stop Polling and Start Listening

Python tutorial - IT technology blog
Python tutorial - IT technology blog

The Problems with Manual Checks and “Patchwork” Solutions

Back when I first started in IT, I was assigned a rather tricky task. Whenever a client uploaded a design file to the server via FTP, I had to immediately copy it to a backup folder and notify the team. It sounds simple, but clients could upload at any time—whether it was 2 AM or right in the middle of my dinner.

Initially, I remoted into the server every 15 minutes to check manually. This was both exhausting and prone to errors. Later, I wrote a Python script using a while True loop combined with time.sleep(60) to scan the file list. However, a long wait time caused delays, while scanning every second made the server’s CPU spike to 20-30% just from constant disk reading.

After several “trials and errors,” I realized that waiting for the file system to notify me of changes would be far more efficient than constantly asking it.

Why is Polling a Bad Solution?

In programming, this periodic scanning technique is called Polling. You are essentially asking the operating system over and over: “Is there a new file yet?”. This approach has two fatal flaws:

  • I/O Resource Intensive: Constantly reading a list of thousands of files is an extremely heavy task. It keeps the hard drive active, reducing hardware lifespan and slowing down other applications.
  • Latency: If you scan every 5 minutes, a file might sit waiting for 4 minutes and 59 seconds before being processed. For systems requiring immediate response, this is unacceptable.

Modern operating systems like Linux (inotify), macOS (FSEvents), and Windows all have Event-driven mechanisms. Instead of asking, our program registers with the OS to receive a signal as soon as a file creation, deletion, or modification event occurs.

Watchdog – A Professional File Monitoring Tool

Python’s Watchdog library is the optimal solution for leveraging this Event-driven mechanism. It is cross-platform and extremely efficient in terms of CPU usage.

Watchdog operates based on two core components:

  1. Observer: A background thread that “listens” for signals from the operating system.
  2. Event Handler: A set of functions that handle the logic when specific events occur (such as on_modified or on_created).

Implementing a Basic Monitoring Tool

First, install the library via pip using the command:

pip install watchdog

Below is the boilerplate code I often use. It monitors a directory and prints a notification as soon as a change is detected.

import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class MyHandler(FileSystemEventHandler):
    def on_modified(self, event):
        if not event.is_directory:
            print(f"File {event.src_path} was just updated.")

    def on_created(self, event):
        if not event.is_directory:
            print(f"New file detected: {event.src_path}")

if __name__ == "__main__":
    path = "./my_folder" 
    event_handler = MyHandler()
    observer = Observer()
    observer.schedule(event_handler, path, recursive=True)
    
    print(f"System is monitoring directory: {path}...")
    observer.start()
    
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

A small note: The recursive=True parameter allows you to manage subdirectories as well. If you only need to monitor the root folder to save memory, change it to False.

Application: Automatically Sorting Downloads

Let’s upgrade the script to solve a real-world problem: Automatically cleaning up the Downloads folder. Every time a .jpg or .pdf file is downloaded, the system will automatically move them to their respective folders.

import os
import shutil
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class CleanupHandler(FileSystemEventHandler):
    def on_created(self, event):
        if event.is_directory:
            return
        
        filename = os.path.basename(event.src_path)
        ext = os.path.splitext(filename)[1].lower()
        
        dest_map = {
            ".jpg": "./Images", ".png": "./Images",
            ".pdf": "./Documents", ".docx": "./Documents"
        }
        
        destination = dest_map.get(ext)
        if destination:
            os.makedirs(destination, exist_ok=True)
            # Wait 1s to ensure the file has finished writing to disk
            time.sleep(1)
            shutil.move(event.src_path, os.path.join(destination, filename))
            print(f"Moved {filename} to {destination}")

The trick here is the time.sleep(1) function. When a file is being downloaded, the OS might trigger a created event before the data is fully written. Waiting a moment helps avoid “File in use” errors when you attempt to move the file.

Important Notes for Real-World Deployment

To ensure your script runs stably in a production environment, keep these 3 points in mind:

  1. Avoid Event Loops: If your script writes logs to the same directory it is monitoring, Watchdog will catch that log-writing event and trigger again. This creates an infinite loop that can hang the script.
  2. Exception Handling: Always use try-except when performing file operations. If a file is locked by another software (like Excel), the shutil.move command will crash the entire program.
  3. Maintaining Uptime: On Linux, set up the script as a systemd service. This ensures the script automatically restarts if the server reboots or encounters an unexpected error.

Using Watchdog saves me at least 30-40 minutes a day on trivial checking tasks. If you have repetitive workflows involving files, try writing a monitoring script today.

Share: