Python Testing with pytest and unittest

Testing in Python is crucial for ensuring code reliability, with two main frameworks: unittest and pytest. They provide structured approaches for verifying functionality, catching bugs, and enhancing code quality.

Tests are code that verifies your other code works correctly. They catch regressions when you refactor, document expected behavior more reliably than comments, and force you to think about edge cases before they become production bugs. Python has two main testing frameworks: unittest, which ships with the standard library and follows a class-based JUnit style, and pytest, which uses plain functions, plain assertions, and is the community standard for most new projects today. Both run the same tests; this article covers both in full.

The assert Statement

Everything in Python testing is built on top of one keyword: assert.

assert 1 + 1 == 2 # passes silently
assert 1 + 1 == 3 # raises AssertionError
assert 1 + 1 == 3, "math broken" # AssertionError with a message

If the expression is truthy, assert does nothing and execution continues. If it is falsy, it raises AssertionError and stops the program. The optional message after the comma appears in the error output, which makes failures easier to diagnose when the expression alone is not self-explanatory.

pytest Basics

Anatomy of a test

pytest’s discovery rules are simple: test files must be named test_*.py or *_test.py, test functions must start with test_, and test classes must start with Test with no __init__ method. Everything else in those files is ignored.

def is_even(num):
if num == 0:
raise ValueError
return num % 2 == 0
def test_numbers():
assert is_even(2) is True
assert is_even(3) is False

pytest auto-discovers test_numbers because of its prefix, runs the two assertions, and reports a pass if both hold. The is True / is False form is an identity check, the strictest option. == True / == False also works but will match truthy or falsy values rather than the exact boolean.

Running tests

pytest # run everything in the current directory
pytest test_helpers.py # specific file
pytest test_helpers.py -k numbers # only tests whose name contains "numbers"
pytest -v # verbose: show each test name and result
pytest -x # stop on the first failure
pytest --tb=short # shorter traceback output
pytest -s # show print() output (no stdout capture)
pytest --maxfail=3 # stop after 3 failures
pytest -q # quiet output

Testing for expected exceptions

When a function should raise on bad input, use pytest.raises as a context manager. If the expected exception is not raised inside the block, the test fails.

import pytest
def test_zero():
with pytest.raises(ValueError):
is_even(0)

To also inspect the exception message, bind the captured exception with as:

with pytest.raises(ValueError) as exc_info:
is_even(0)
assert "must" in str(exc_info.value)

This is useful when a function raises the same exception type for different reasons and you want to confirm it raised for the right one.

Markers

Markers are decorators that change how pytest treats a test.

@pytest.mark.xfail

Marks a test as expected to fail. Useful for known bugs or unfinished features. If the test fails, pytest reports it as XFAILand moves on without raising an alarm. If it unexpectedly passes, pytest flags it as XPASS, which is a hint that the bug may have been fixed.

@pytest.mark.xfail
def test_fails():
assert is_even(2) == False # intentionally wrong

@pytest.mark.skipif

Skips a test when a condition is true. Skipped tests show as SKIPPED, not failures.

The modern syntax takes a direct boolean and a required reason argument:

from datetime import datetime
day_of_week = datetime.now().isoweekday()
@pytest.mark.skipif(day_of_week == 6, reason="Skip on Saturday")
def test_deduplication():
assert deduplicate([1, 2, 3]) == [1, 2, 3]
assert deduplicate([1, 2, 3, 1]) == [1, 2, 3]

An older form accepts a string expression that pytest evaluates, but the direct boolean form is cleaner and more readable.

Marker reference

MarkerEffect
@pytest.mark.skipAlways skip
@pytest.mark.skipif(condition, reason=)Skip when condition is true
@pytest.mark.xfailExpected to fail
@pytest.mark.parametrizeRun with multiple input sets
@pytest.mark.slowCustom marker (define behavior in config)

Fixtures

A fixture is reusable setup code. Tests declare what they need by parameter name, and pytest provides it automatically.

Basic fixture

import pytest
@pytest.fixture
def sample_data():
return [i for i in range(10)]
def test_elements(sample_data):
assert 9 in sample_data
assert 10 not in sample_data

pytest sees that test_elements has a parameter named sample_data, finds the matching fixture, runs it, and passes the result in. The fixture runs once per test function by default, so each test gets a fresh copy.

Fixtures depending on other fixtures

Fixtures can declare other fixtures as parameters, and pytest resolves the entire dependency chain automatically.

@pytest.fixture
def seq_length():
return 10
@pytest.fixture
def number_list(seq_length):
return [i for i in range(seq_length)]
def test_contents(number_list):
assert 9 in number_list
assert 10 not in number_list

test_contents only requests number_list, but pytest figures out that number_list depends on seq_length and runs that first.

autouse=True

A fixture with autouse=True runs for every test in scope without being explicitly requested. Useful for setup that applies everywhere, such as resetting a database before each test.

@pytest.fixture
def base_list():
return []
@pytest.fixture(autouse=True)
def populate_list(base_list):
base_list.extend([i for i in range(10)])
def test_elements(base_list):
assert 1 in base_list
assert 9 in base_list

Use autouse sparingly. When a fixture runs without being requested, the setup is invisible in the test body, which makes the test harder to understand in isolation.

Setup and teardown with yield

yield splits a fixture into setup and teardown. Everything before yield runs before the test; everything after yield runs after the test, even if the test fails.

@pytest.fixture
def sample_data():
data = [i for i in range(10)]
yield data
data.clear()
del data
def test_elements(sample_data):
assert 9 in sample_data
assert 10 not in sample_data

This is equivalent to wrapping the test in a try/finally block. The resource is created, handed to the test via yield, and then guaranteed to be cleaned up regardless of the test outcome.

A practical version with a DataFrame:

import pytest
import pandas as pd
@pytest.fixture
def records_df():
df = pd.read_csv('/usr/local/share/records.csv')
yield df
df.drop(df.index, inplace=True)
del df
def test_type(records_df):
assert type(records_df) == pd.DataFrame
def test_shape(records_df):
assert records_df.shape[0] > 0

Each test that requests records_df gets a freshly loaded DataFrame. The cleanup after yield ensures rows are cleared and the variable is released.

Fixture scope

By default, fixtures rebuild for every test function. The scope parameter changes this.

@pytest.fixture(scope="module")
def expensive_resource():
return load_huge_dataset()
ScopeRunsUse case
function (default)Once per testIsolated, safe
classOnce per test classShared across class methods
moduleOnce per fileExpensive setup reused in a file
sessionOnce per pytest runDatabase startup, external services

The trade-off: broader scope is faster but tests share state, so a mutation in one test can leak into the next.

Types of Tests

Unit tests

A unit test pokes one function with one specific input and checks one specific output. The minimum viable set of inputs is the “happy path, edge case, error case” triple.

def factorial(n):
if n == 0:
return 1
elif type(n) == int:
return n * factorial(n - 1)
else:
return -1
def test_regular():
assert factorial(5) == 120
def test_zero():
assert factorial(0) == 1
def test_str():
assert factorial('5') == -1

test_regular covers the normal path, test_zero covers the boundary where factorials have a special case, and test_str covers bad input. These three categories catch most bugs and document how the function is supposed to behave for anyone reading the test file later.

Integration tests

Integration tests wire two or more components together and verify they cooperate correctly. Where a unit test isolates a function, an integration test checks the seams between functions.

import pandas as pd
import pytest
@pytest.fixture
def raw_df():
return pd.read_csv('devices.csv')
def group_sum(data, group_col, agg_col):
return data.groupby(group_col)[agg_col].sum()
def test_agg_feature(raw_df):
result = group_sum(raw_df, 'Brand', 'Cost')
assert type(result) == pd.Series
assert result.shape[0] > 0
assert result.dtype in (int, float)

The fixture loads data (one component) and the test calls the aggregation function (another component). If either piece breaks, or if the loader produces column names the aggregator does not recognize, this test catches it.

Verifying data loading

The cheapest sanity check you can write for any data-loading code is two assertions: did the loader return the right type, and is the result not empty?

def test_load(raw_df):
assert type(raw_df) == pd.DataFrame
assert raw_df.shape[0] > 0

These two lines take seconds to write and prevent hours of downstream debugging when a silent empty DataFrame propagates through a pipeline.

Performance Benchmarking with pytest-benchmark

pytest-benchmark is a plugin that times functions and produces a statistical summary across many runs. Install with pip install pytest-benchmark.

Comparing data structures

def create_list():
return [i for i in range(1000)]
def create_set():
return {i for i in range(1000)}
def find(it, el=50):
return el in it
def test_list(benchmark):
benchmark(find, it=create_list())
def test_set(benchmark):
benchmark(find, it=create_set())

The benchmark fixture, provided by the plugin, calls the given function many times and records mean, median, and standard deviation. For membership testing, test_set will be significantly faster because in on a set is O(1) while in on a list is O(n).

When you want to time a code block defined inline, the decorator form works:

def test_list_iteration(benchmark):
@benchmark
def iterate():
for el in [i for i in range(1000)]:
pass
def test_set_iteration(benchmark):
@benchmark
def iterate():
for el in {i for i in range(1000)}:
pass

Iteration over list and set is roughly equivalent in speed because both require visiting every element. The membership test is where they diverge.

Benchmarking file reads

DF_PATH = '/usr/local/share/workforce.csv'
def test_reading_speed(benchmark):
benchmark(pd.read_csv, DF_PATH)

This one-liner times how long CSV loading takes. Run it before and after upgrading pandas or refactoring a data loader, and you will immediately see whether performance changed.

unittest: The Stdlib Classic

unittest is class-based and built into Python with no installation required. Tests live as methods inside a class that inherits from unittest.TestCase.

import unittest
def factorial(number):
if number < 0:
raise ValueError('Factorial is not defined for negative values')
result = 1
while number > 1:
result = result * number
number = number - 1
return result
class TestFactorial(unittest.TestCase):
def test_positives(self):
self.assertEqual(factorial(5), 120)
def test_zero(self):
self.assertEqual(factorial(0), 1)
def test_negatives(self):
with self.assertRaises(ValueError):
factorial(-1)

Instead of plain assert, you call self.assert* methods. The intent is explicit from the method name: assertEqual means you are checking equality, assertRaises means you are checking for an exception. The verbose style adds clarity but at the cost of more typing.

Common assertion methods

MethodChecks
assertEqual(a, b)a == b
assertNotEqual(a, b)a != b
assertTrue(x)bool(x) is True
assertFalse(x)bool(x) is False
assertIsNone(x)x is None
assertIsNotNone(x)x is not None
assertIn(a, b)a in b
assertNotIn(a, b)a not in b
assertIsInstance(x, t)isinstance(x, t)
assertRaises(E)block raises E
assertAlmostEqual(a, b)a ≈ b (for floats)

Boolean assertion example

import math
def is_prime(num):
if num == 1:
return False
up_limit = int(math.sqrt(num)) + 1
for i in range(2, up_limit):
if num % i == 0:
return False
return True
class TestSuite(unittest.TestCase):
def test_is_prime(self):
self.assertFalse(is_prime(1))

assertFalse(x) is equivalent to assert not x in pytest. The dedicated method name makes the intent immediately clear without any mental translation.

Running unittest from the command line

python -m unittest test_helpers.py # run a specific file
python -m unittest -v # verbose output
python -m unittest -f # stop on first failure
python -m unittest -k palindrome # keyword filter (Python 3.7+)
python -m unittest discover # auto-discover all test files

setUp and tearDown

setUp runs before every test method in the class. tearDown runs after every test method, even if the test failed. Together they are unittest’s equivalent of pytest’s yield fixtures.

class TestWord(unittest.TestCase):
def setUp(self):
self.word = 'banana'
def test_the_word(self):
self.assertIn('b', self.word)
self.assertNotIn('B', self.word)
self.assertNotIn('y', self.word)
def tearDown(self):
del self.word

State goes on self in setUp, is available to every test method, and gets cleaned up in tearDown.

A more practical example with a list operation:

def is_palindrome(string):
return string == string[::-1]
def word_list():
return ['level', 'step', 'peep', 'toot']
class TestPalindrome(unittest.TestCase):
def setUp(self):
self.data = word_list()
def test_palindromes(self):
expected = [True, False, True, True]
results = list(map(is_palindrome, self.data))
self.assertEqual(results, expected)
def tearDown(self):
self.data.clear()

'step' is the only non-palindrome in the list, so the expected result is [True, False, True, True]setUp loads the words, the test checks them, and tearDown clears the list.

pytest vs unittest

Featurepytestunittest
StyleFunctionsClasses
AssertionsPlain assertself.assertEqual, etc.
Fixtures@pytest.fixture with yieldsetUp / tearDown
Parametrization@pytest.mark.parametrizesubTest (verbose)
PluginsLarge ecosystemLimited
Built-inNo (pip install pytest)Yes
OutputRich, colored, detailedPlain text

For new projects, use pytest. It is more concise, its failure output is far more helpful, and it can run unittest tests without modification, so you do not have to migrate an existing suite all at once.

A Complete Test Suite

Here is what a production data pipeline test file looks like when the patterns from this article are combined.

import pytest
import pandas as pd
DF_PATH = '/usr/local/share/workforce.csv'
@pytest.fixture
def load_df():
return pd.read_csv(DF_PATH)
def summarize_by_year(df):
return df.groupby('year').agg({'compensation': 'describe'})['compensation']
def test_load(load_df):
assert isinstance(load_df, pd.DataFrame)
assert load_df.shape[0] > 0
def test_grouped(load_df):
stats_by_year = summarize_by_year(load_df)
assert stats_by_year.isna().sum().sum() == 0
def test_year_summary(load_df):
stats_by_year = summarize_by_year(load_df)
median_2022 = stats_by_year.loc[2022, '50%']
assert isinstance(median_2022, float)
assert median_2022 > 0
def test_reading_speed(benchmark):
benchmark(pd.read_csv, DF_PATH)

Each test isolates one concern. test_load is a sanity check: did the file load into the right type and is it non-empty? test_grouped is a correctness check: does the aggregation produce a clean result with no missing values? test_year_summary is a feature-level check: does a specific year’s median look like a valid number? test_reading_speed is a performance regression guard: if loading slows down after a library upgrade, the benchmark will show it.

If the CSV format changes, test_load fails. If the aggregation logic breaks, test_grouped fails. If data disappears for a particular year, test_year_summary fails. If pandas gets slower after an upgrade, the benchmark catches it. Each layer of the test suite is watching for a different category of breakage.

One note on isinstance vs typeisinstance(x, T) is preferred over type(x) == T because it handles subclasses correctly. A pandas DataFrame subclass would pass isinstance(x, pd.DataFrame) but fail type(x) == pd.DataFrame.

The Testing Pyramid

Most tests should be unit tests because they are fast (milliseconds), isolated, and easy to debug when they fail. Integration tests are slower and cover the interaction between components. End-to-end tests are the slowest and fewest, covering complete user-facing workflows.

        /\
       /  \
      /E2E \       few; full system, slow
     /------\
    /  Integ \     some; multi-component tests
   /----------\
  /    Unit    \   many; one function, fast
 /--------------\


A rough split for most projects: 70% unit tests, 25% integration tests, 5% end-to-end tests.

For any given function, the minimum test coverage is: one happy-path input, one edge-case input (zero, empty, boundary), and one error-case input (wrong type, out of range). Add side-effect checks if the function writes to disk or a database, and a benchmark if it sits on a performance-critical path.

Quick Reference

Taskpytestunittest
Define a testdef test_x():class TestX(unittest.TestCase): + def test_x(self):
Assert equalassert a == bself.assertEqual(a, b)
Assert truthyassert xself.assertTrue(x)
Assert raiseswith pytest.raises(E):with self.assertRaises(E):
Setup per test@pytest.fixturedef setUp(self):
Teardown per testyield then cleanup codedef tearDown(self):
Skip always@pytest.mark.skip@unittest.skip
Skip conditionally@pytest.mark.skipif(cond, reason=)@unittest.skipIf(cond, reason)
Expected failure@pytest.mark.xfail@unittest.expectedFailure
Run all testspytestpython -m unittest discover
Verbosepytest -vpython -m unittest -v
Keyword filterpytest -k "name"python -m unittest -k "name"
Stop on first failurepytest -xpython -m unittest -f
Benchmarkbenchmark(func, *args)not available

See you soon.

View Comments (2)

Leave a Reply

Prev Next

Subscribe to My Newsletter

Subscribe to my email newsletter to get the latest posts delivered right to your email. Pure inspiration, zero spam.

Discover more from Datalad - Data Science and ML

Subscribe now to keep reading and get access to the full archive.

Continue reading