Torturing Your Python Code with Hypothesis: Finding Logic Bugs Automatically with Property-based Testing – ITFROMZERO

Table of Contents

Why Do Standard Unit Tests Still Miss Bugs?

As developers, we’ve all likely experienced that feeling: code is finished, all unit tests pass, and we’re feeling extremely confident. But just a few hours after deploying to production, bugs start popping up everywhere. Why is that?

The core issue is that we only test what we can “anticipate.” Vulnerabilities hiding in our blind spots silently wait for the right moment to surface.

Take a function that adds two numbers, for example. Normally, we might write assert add(1, 2) == 3. This is called Example-based Testing. However, it’s easy to overlook bizarre edge cases: extremely large floats causing memory overflows, empty strings, or unusual characters. A hacker or a “creative” user could easily crash the system with these inputs.

This is where Property-based Testing (PBT) comes to the rescue. Instead of racking your brain for examples, you simply define the “properties” of the function. The computer then takes over, automatically generating thousands of devious inputs to try and break your code.

Property-based Testing and the Hypothesis Library

Don’t Test Examples, Test Properties

Instead of asserting “1 + 2 must equal 3,” you command: “For any two integers, the result of the addition must always be the same regardless of order (a + b = b + a).” This is a property. Your job is to define the rules; the tool’s job is to find the loopholes.

Hypothesis: A Sharp “Bug Detector” for Python

In the Python ecosystem, Hypothesis is the standard library for implementing PBT. It doesn’t just generate random data blindly. Hypothesis is smart enough to learn from previous failures. If it finds a vulnerability, it remembers and prioritizes checking that sensitive data range in future runs.

Getting Hands-on with Hypothesis

Installation is quick and easy via pip:

pip install hypothesis pytest

Let’s apply it to a common list sorting function:

# code_can_test.py
def sort_list(numbers):
    return sorted(numbers)

# test_code.py
from hypothesis import given, strategies as st
from code_can_test import sort_list

@given(st.lists(st.integers()))
def test_sort_list_properties(l):
    result = sort_list(l)
    # 1. List length must not change
    assert len(result) == len(l)
    # 2. Subsequent elements must be greater than or equal to previous ones
    assert all(result[i] <= result[i+1] for i in range(len(result) - 1))
    # 3. No elements should be lost or added compared to the original
    assert sorted(result) == sorted(l)

By default, the @given decorator will run this test case 100 times with all sorts of lists: from empty ones and single-element lists to those containing thousands of massive negative numbers. If any input causes an assert to fail, Hypothesis will halt and report it immediately.

Shrinking Feature: Reducing Bugs for Easier Debugging

This is the most valuable feature. Imagine Hypothesis finds a list of 500 numbers that triggers a logic error. Sifting through that to find the cause would be a nightmare.

Hypothesis will immediately perform “Shrinking” automatically. It attempts to remove elements or replace large numbers with smaller ones (like 0 or 1) to find the most minimal example that causes the failure. The result might be as concise as: “Function fails when the list is exactly [1, 0].” This saves you an entire afternoon of scouring logs.

Leveraging Advanced Strategies

Hypothesis provides a rich set of data generators (strategies), ranging from emails and IP addresses to complex objects.

Specifically with Regex, finding edge cases manually is extremely difficult. My experience is that you should quickly test patterns in the browser at toolcraft.app/en/tools/developer/regex-tester before putting them into code. Once the pattern works well, use Hypothesis to “bombard” it with data to ensure no cases are missed.

from hypothesis import given, strategies as st

@given(st.text(alphabet=st.characters(whitelist_categories=('Lu', 'Ll')), min_size=1))
def test_process_text(s):
    # Ensure the text processing function never returns None for any string of letters
    result = process_text(s)
    assert result is not None

A Few Tips for More Effective Use

Shift your mindset: Don’t force yourself to think about specific values. Focus on constraints and the relationship between input and output.
Balance speed: Since it runs multiple times, PBT is slower than traditional unit tests. Prioritize applying it to critical business logic or sensitive data filters.
Works with Pytest: Just run pytest, and Hypothesis will be activated automatically without complex configuration.

Conclusion

Hypothesis isn’t designed to completely replace Unit Testing, but it serves as an incredibly robust defensive layer for your code. It helps bring to light bugs that we wouldn’t even dream of.

If you’re working on financial systems, handling Big Data, or simply want to sleep better after a deployment, try integrating Hypothesis today. It might be challenging to identify “properties” at first, but once you get the hang of it, you’ll see it’s an exceptionally high-return investment for software quality.