Python Mutation Tests

I have recently come across a concept that filled a huge hole in my software development skills. Let’s imagine that you follow TDD principles, write well testable code, but as soon as you merge your changes to the master branch you are left wondering: “have I tested all the cases?” Tests might be green now, but what if they remain green even if they should not? Ladies and gentlemen, welcome mutation tests with Python examples.

Someone looking at the mirror saying: 
"Mirror mirror on the wall, how much testing to test it all?"
Source; https://pedrorijo.com/blog/intro-mutation/

What is a mutation test

First, let’s get some context. Imagine you’re writing an application that calculates discounts for clients. On the entry we give it client tier (1 or 2) and order total. Clients of tier 1 get 0% discount, clients for tier 2 get 10% discount. The calculator code is simple:

def calculate(tier: int, total: int) -> float:
    if tier == 2:
        return total * 0.9
    return total * 1.0

Of course as good developers, we followed TDD and created proper tests in advance:

from calculator import calculate


def test_tier_2_discount():
    assert calculate(2, 100) == 90.0

As expected the tests pass:

$ pytest
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.10.6, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/gonczor/Projects/mutants
collected 1 item                                                                                                                                                                                          

tests/test_calculator.py .                                                                                                                                                                          [100%]

============================================================================================ 1 passed in 0.00s ============================================================================================

Manual tests also pass:

$ python src/main.py
Enter client tier: 2
Enter order total: 200
The discounted price is: 180.0

However, good tests are meant to save us from undetected bugs when the code changes in the future. Does our code meet this requirement? Of course not. We have only 1 test and 2 cases. If someone accidentally mutates our code and decides to return total * 1.1 for tier 1 clients, we will not detect this. This code may seem trivial, but it’s easy to miss certain cases in real world examples.

Python’s mutatest library

Luckily we have a tool that can help us detect this – mutatest library. The idea behind this tool is to generate mutated code and then run existing tests against it. If the test fails, we’re safe. It means that certain case is covered and we can sleep well. Alternatively, mutants can get unnoticed (survive). This indicates that our unit (or integration or whatever) tests are are likely to be insufficient.

We only need to pass source path we want to check with optional exclusions: mutatest -s src -e src/main.py. At the end we get a report that looks like:

--------
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Pow'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Div'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Add'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.FloorDiv'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Mod'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Sub'>

2023-01-11 12:59:15,739: Timedout mutations:

2023-01-11 12:59:15,739: Surviving mutations:

SURVIVED
--------
 - src/calculator.py: (l: 2, c: 4) - mutation from If_Statement to If_True
 - src/calculator.py: (l: 2, c: 7) - mutation from <class 'ast.Eq'> to <class 'ast.GtE'>
 - src/calculator.py: (l: 4, c: 11) - mutation from <class 'ast.Mult'> to <class 'ast.FloorDiv'>

So additional cases we want to cover are:

  1. tier 1 should return the same amount converted to float
  2. non existing tier should raise some error

As a result of the second requirement we need to update the calculate() function:

def calculate(tier: int, total: int) -> float:
    if tier == 2:
        return total * 0.9
    elif tier == 1:
        return float(total)
    else:
        raise ValueError("Wrong client tier.")

Improved tests:

from pytest import raises, mark

from calculator import calculate


def test_tier_1_discount():
    assert calculate(1, 100) == 100.0


def test_tier_2_discount():
    assert calculate(2, 100) == 90.0


@mark.parametrize("tier", [0, 3])
def test_non_existing_tier(tier: int):
    with raises(ValueError):
        calculate(tier, 100)

Let’s now run pytest:

$ pytest
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.10.6, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/gonczor/Projects/mutants
collected 4 items                                                                                                                                                                                         

tests/test_calculator.py ....                                                                                                                                                                       [100%]

============================================================================================ 4 passed in 0.01s ============================================================================================

And mutatest:

DETECTED
--------
 - src/calculator.py: (l: 2, c: 4) - mutation from If_Statement to If_True
 - src/calculator.py: (l: 2, c: 4) - mutation from If_Statement to If_False
 - src/calculator.py: (l: 2, c: 7) - mutation from <class 'ast.Eq'> to <class 'ast.Lt'>
 - src/calculator.py: (l: 2, c: 7) - mutation from <class 'ast.Eq'> to <class 'ast.LtE'>
 - src/calculator.py: (l: 2, c: 7) - mutation from <class 'ast.Eq'> to <class 'ast.Gt'>
 - src/calculator.py: (l: 2, c: 7) - mutation from <class 'ast.Eq'> to <class 'ast.NotEq'>
 - src/calculator.py: (l: 2, c: 7) - mutation from <class 'ast.Eq'> to <class 'ast.GtE'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Add'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Mod'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Sub'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Div'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.FloorDiv'>
 - src/calculator.py: (l: 3, c: 15) - mutation from <class 'ast.Mult'> to <class 'ast.Pow'>
 - src/calculator.py: (l: 4, c: 4) - mutation from If_Statement to If_False
 - src/calculator.py: (l: 4, c: 4) - mutation from If_Statement to If_True
 - src/calculator.py: (l: 4, c: 9) - mutation from <class 'ast.Eq'> to <class 'ast.Gt'>
 - src/calculator.py: (l: 4, c: 9) - mutation from <class 'ast.Eq'> to <class 'ast.LtE'>
 - src/calculator.py: (l: 4, c: 9) - mutation from <class 'ast.Eq'> to <class 'ast.GtE'>
 - src/calculator.py: (l: 4, c: 9) - mutation from <class 'ast.Eq'> to <class 'ast.NotEq'>
 - src/calculator.py: (l: 4, c: 9) - mutation from <class 'ast.Eq'> to <class 'ast.Lt'>

2023-01-11 13:10:44,875: Timedout mutations:

2023-01-11 13:10:44,875: Surviving mutations:

Hurray! No mutants survived! Our simple 4 cases covered all possibilities. Any change (the introduction of new tier, change in calculation rates) now need to be introduced with explicit change in the tests code.

Summary

To sum up, with mutatest we can write proper Python mutation tests that will help us detect cases, where code changes would go unnoticed and therefore break our system. You can find more on mutation testing here: