Tests are code that verifies your other code works correctly. They catch regressions when you refactor, document expected behavior more reliably than comments, and force you to think about edge cases before they become production bugs. Python has two main testing frameworks: unittest, which ships with the standard library and follows a class-based JUnit style, and pytest, which uses plain functions, plain assertions, and is the community standard for most new projects today. Both run the same tests; this article covers both in full.
The assert Statement
Everything in Python testing is built on top of one keyword: assert.
assert 1 + 1 == 2 # passes silentlyassert 1 + 1 == 3 # raises AssertionErrorassert 1 + 1 == 3, "math broken" # AssertionError with a message
If the expression is truthy, assert does nothing and execution continues. If it is falsy, it raises AssertionError and stops the program. The optional message after the comma appears in the error output, which makes failures easier to diagnose when the expression alone is not self-explanatory.
pytest Basics
Anatomy of a test
pytest’s discovery rules are simple: test files must be named test_*.py or *_test.py, test functions must start with test_, and test classes must start with Test with no __init__ method. Everything else in those files is ignored.
def is_even(num): if num == 0: raise ValueError return num % 2 == 0def test_numbers(): assert is_even(2) is True assert is_even(3) is False
pytest auto-discovers test_numbers because of its prefix, runs the two assertions, and reports a pass if both hold. The is True / is False form is an identity check, the strictest option. == True / == False also works but will match truthy or falsy values rather than the exact boolean.
Running tests
pytest # run everything in the current directorypytest test_helpers.py # specific filepytest test_helpers.py -k numbers # only tests whose name contains "numbers"pytest -v # verbose: show each test name and resultpytest -x # stop on the first failurepytest --tb=short # shorter traceback outputpytest -s # show print() output (no stdout capture)pytest --maxfail=3 # stop after 3 failurespytest -q # quiet output
Testing for expected exceptions
When a function should raise on bad input, use pytest.raises as a context manager. If the expected exception is not raised inside the block, the test fails.
import pytestdef test_zero(): with pytest.raises(ValueError): is_even(0)
To also inspect the exception message, bind the captured exception with as:
with pytest.raises(ValueError) as exc_info: is_even(0)assert "must" in str(exc_info.value)
This is useful when a function raises the same exception type for different reasons and you want to confirm it raised for the right one.
Markers
Markers are decorators that change how pytest treats a test.
@pytest.mark.xfail
Marks a test as expected to fail. Useful for known bugs or unfinished features. If the test fails, pytest reports it as XFAILand moves on without raising an alarm. If it unexpectedly passes, pytest flags it as XPASS, which is a hint that the bug may have been fixed.
pytest.mark.xfaildef test_fails(): assert is_even(2) == False # intentionally wrong
@pytest.mark.skipif
Skips a test when a condition is true. Skipped tests show as SKIPPED, not failures.
The modern syntax takes a direct boolean and a required reason argument:
from datetime import datetimeday_of_week = datetime.now().isoweekday()pytest.mark.skipif(day_of_week == 6, reason="Skip on Saturday")def test_deduplication(): assert deduplicate([1, 2, 3]) == [1, 2, 3] assert deduplicate([1, 2, 3, 1]) == [1, 2, 3]
An older form accepts a string expression that pytest evaluates, but the direct boolean form is cleaner and more readable.
Marker reference
| Marker | Effect |
|---|---|
@pytest.mark.skip | Always skip |
@pytest.mark.skipif(condition, reason=) | Skip when condition is true |
@pytest.mark.xfail | Expected to fail |
@pytest.mark.parametrize | Run with multiple input sets |
@pytest.mark.slow | Custom marker (define behavior in config) |
Fixtures
A fixture is reusable setup code. Tests declare what they need by parameter name, and pytest provides it automatically.
Basic fixture
import pytestpytest.fixturedef sample_data(): return [i for i in range(10)]def test_elements(sample_data): assert 9 in sample_data assert 10 not in sample_data
pytest sees that test_elements has a parameter named sample_data, finds the matching fixture, runs it, and passes the result in. The fixture runs once per test function by default, so each test gets a fresh copy.
Fixtures depending on other fixtures
Fixtures can declare other fixtures as parameters, and pytest resolves the entire dependency chain automatically.
pytest.fixturedef seq_length(): return 10pytest.fixturedef number_list(seq_length): return [i for i in range(seq_length)]def test_contents(number_list): assert 9 in number_list assert 10 not in number_list
test_contents only requests number_list, but pytest figures out that number_list depends on seq_length and runs that first.
autouse=True
A fixture with autouse=True runs for every test in scope without being explicitly requested. Useful for setup that applies everywhere, such as resetting a database before each test.
pytest.fixturedef base_list(): return []pytest.fixture(autouse=True)def populate_list(base_list): base_list.extend([i for i in range(10)])def test_elements(base_list): assert 1 in base_list assert 9 in base_list
Use autouse sparingly. When a fixture runs without being requested, the setup is invisible in the test body, which makes the test harder to understand in isolation.
Setup and teardown with yield
yield splits a fixture into setup and teardown. Everything before yield runs before the test; everything after yield runs after the test, even if the test fails.
pytest.fixturedef sample_data(): data = [i for i in range(10)] yield data data.clear() del datadef test_elements(sample_data): assert 9 in sample_data assert 10 not in sample_data
This is equivalent to wrapping the test in a try/finally block. The resource is created, handed to the test via yield, and then guaranteed to be cleaned up regardless of the test outcome.
A practical version with a DataFrame:
import pytestimport pandas as pdpytest.fixturedef records_df(): df = pd.read_csv('/usr/local/share/records.csv') yield df df.drop(df.index, inplace=True) del dfdef test_type(records_df): assert type(records_df) == pd.DataFramedef test_shape(records_df): assert records_df.shape[0] > 0
Each test that requests records_df gets a freshly loaded DataFrame. The cleanup after yield ensures rows are cleared and the variable is released.
Fixture scope
By default, fixtures rebuild for every test function. The scope parameter changes this.
pytest.fixture(scope="module")def expensive_resource(): return load_huge_dataset()
| Scope | Runs | Use case |
|---|---|---|
function (default) | Once per test | Isolated, safe |
class | Once per test class | Shared across class methods |
module | Once per file | Expensive setup reused in a file |
session | Once per pytest run | Database startup, external services |
The trade-off: broader scope is faster but tests share state, so a mutation in one test can leak into the next.
Types of Tests
Unit tests
A unit test pokes one function with one specific input and checks one specific output. The minimum viable set of inputs is the “happy path, edge case, error case” triple.
def factorial(n): if n == 0: return 1 elif type(n) == int: return n * factorial(n - 1) else: return -1def test_regular(): assert factorial(5) == 120def test_zero(): assert factorial(0) == 1def test_str(): assert factorial('5') == -1
test_regular covers the normal path, test_zero covers the boundary where factorials have a special case, and test_str covers bad input. These three categories catch most bugs and document how the function is supposed to behave for anyone reading the test file later.
Integration tests
Integration tests wire two or more components together and verify they cooperate correctly. Where a unit test isolates a function, an integration test checks the seams between functions.
import pandas as pdimport pytestpytest.fixturedef raw_df(): return pd.read_csv('devices.csv')def group_sum(data, group_col, agg_col): return data.groupby(group_col)[agg_col].sum()def test_agg_feature(raw_df): result = group_sum(raw_df, 'Brand', 'Cost') assert type(result) == pd.Series assert result.shape[0] > 0 assert result.dtype in (int, float)
The fixture loads data (one component) and the test calls the aggregation function (another component). If either piece breaks, or if the loader produces column names the aggregator does not recognize, this test catches it.
Verifying data loading
The cheapest sanity check you can write for any data-loading code is two assertions: did the loader return the right type, and is the result not empty?
def test_load(raw_df): assert type(raw_df) == pd.DataFrame assert raw_df.shape[0] > 0
These two lines take seconds to write and prevent hours of downstream debugging when a silent empty DataFrame propagates through a pipeline.
Performance Benchmarking with pytest-benchmark
pytest-benchmark is a plugin that times functions and produces a statistical summary across many runs. Install with pip install pytest-benchmark.
Comparing data structures
def create_list(): return [i for i in range(1000)]def create_set(): return {i for i in range(1000)}def find(it, el=50): return el in itdef test_list(benchmark): benchmark(find, it=create_list())def test_set(benchmark): benchmark(find, it=create_set())
The benchmark fixture, provided by the plugin, calls the given function many times and records mean, median, and standard deviation. For membership testing, test_set will be significantly faster because in on a set is O(1) while in on a list is O(n).
When you want to time a code block defined inline, the decorator form works:
def test_list_iteration(benchmark): benchmark def iterate(): for el in [i for i in range(1000)]: passdef test_set_iteration(benchmark): benchmark def iterate(): for el in {i for i in range(1000)}: pass
Iteration over list and set is roughly equivalent in speed because both require visiting every element. The membership test is where they diverge.
Benchmarking file reads
DF_PATH = '/usr/local/share/workforce.csv'def test_reading_speed(benchmark): benchmark(pd.read_csv, DF_PATH)
This one-liner times how long CSV loading takes. Run it before and after upgrading pandas or refactoring a data loader, and you will immediately see whether performance changed.
unittest: The Stdlib Classic
unittest is class-based and built into Python with no installation required. Tests live as methods inside a class that inherits from unittest.TestCase.
import unittestdef factorial(number): if number < 0: raise ValueError('Factorial is not defined for negative values') result = 1 while number > 1: result = result * number number = number - 1 return resultclass TestFactorial(unittest.TestCase): def test_positives(self): self.assertEqual(factorial(5), 120) def test_zero(self): self.assertEqual(factorial(0), 1) def test_negatives(self): with self.assertRaises(ValueError): factorial(-1)
Instead of plain assert, you call self.assert* methods. The intent is explicit from the method name: assertEqual means you are checking equality, assertRaises means you are checking for an exception. The verbose style adds clarity but at the cost of more typing.
Common assertion methods
| Method | Checks |
|---|---|
assertEqual(a, b) | a == b |
assertNotEqual(a, b) | a != b |
assertTrue(x) | bool(x) is True |
assertFalse(x) | bool(x) is False |
assertIsNone(x) | x is None |
assertIsNotNone(x) | x is not None |
assertIn(a, b) | a in b |
assertNotIn(a, b) | a not in b |
assertIsInstance(x, t) | isinstance(x, t) |
assertRaises(E) | block raises E |
assertAlmostEqual(a, b) | a ≈ b (for floats) |
Boolean assertion example
import mathdef is_prime(num): if num == 1: return False up_limit = int(math.sqrt(num)) + 1 for i in range(2, up_limit): if num % i == 0: return False return Trueclass TestSuite(unittest.TestCase): def test_is_prime(self): self.assertFalse(is_prime(1))
assertFalse(x) is equivalent to assert not x in pytest. The dedicated method name makes the intent immediately clear without any mental translation.
Running unittest from the command line
python -m unittest test_helpers.py # run a specific filepython -m unittest -v # verbose outputpython -m unittest -f # stop on first failurepython -m unittest -k palindrome # keyword filter (Python 3.7+)python -m unittest discover # auto-discover all test files
setUp and tearDown
setUp runs before every test method in the class. tearDown runs after every test method, even if the test failed. Together they are unittest’s equivalent of pytest’s yield fixtures.
class TestWord(unittest.TestCase): def setUp(self): self.word = 'banana' def test_the_word(self): self.assertIn('b', self.word) self.assertNotIn('B', self.word) self.assertNotIn('y', self.word) def tearDown(self): del self.word
State goes on self in setUp, is available to every test method, and gets cleaned up in tearDown.
A more practical example with a list operation:
def is_palindrome(string): return string == string[::-1]def word_list(): return ['level', 'step', 'peep', 'toot']class TestPalindrome(unittest.TestCase): def setUp(self): self.data = word_list() def test_palindromes(self): expected = [True, False, True, True] results = list(map(is_palindrome, self.data)) self.assertEqual(results, expected) def tearDown(self): self.data.clear()
'step' is the only non-palindrome in the list, so the expected result is [True, False, True, True]. setUp loads the words, the test checks them, and tearDown clears the list.
pytest vs unittest
| Feature | pytest | unittest |
|---|---|---|
| Style | Functions | Classes |
| Assertions | Plain assert | self.assertEqual, etc. |
| Fixtures | @pytest.fixture with yield | setUp / tearDown |
| Parametrization | @pytest.mark.parametrize | subTest (verbose) |
| Plugins | Large ecosystem | Limited |
| Built-in | No (pip install pytest) | Yes |
| Output | Rich, colored, detailed | Plain text |
For new projects, use pytest. It is more concise, its failure output is far more helpful, and it can run unittest tests without modification, so you do not have to migrate an existing suite all at once.
A Complete Test Suite
Here is what a production data pipeline test file looks like when the patterns from this article are combined.
import pytestimport pandas as pdDF_PATH = '/usr/local/share/workforce.csv'pytest.fixturedef load_df(): return pd.read_csv(DF_PATH)def summarize_by_year(df): return df.groupby('year').agg({'compensation': 'describe'})['compensation']def test_load(load_df): assert isinstance(load_df, pd.DataFrame) assert load_df.shape[0] > 0def test_grouped(load_df): stats_by_year = summarize_by_year(load_df) assert stats_by_year.isna().sum().sum() == 0def test_year_summary(load_df): stats_by_year = summarize_by_year(load_df) median_2022 = stats_by_year.loc[2022, '50%'] assert isinstance(median_2022, float) assert median_2022 > 0def test_reading_speed(benchmark): benchmark(pd.read_csv, DF_PATH)
Each test isolates one concern. test_load is a sanity check: did the file load into the right type and is it non-empty? test_grouped is a correctness check: does the aggregation produce a clean result with no missing values? test_year_summary is a feature-level check: does a specific year’s median look like a valid number? test_reading_speed is a performance regression guard: if loading slows down after a library upgrade, the benchmark will show it.
If the CSV format changes, test_load fails. If the aggregation logic breaks, test_grouped fails. If data disappears for a particular year, test_year_summary fails. If pandas gets slower after an upgrade, the benchmark catches it. Each layer of the test suite is watching for a different category of breakage.
One note on isinstance vs type: isinstance(x, T) is preferred over type(x) == T because it handles subclasses correctly. A pandas DataFrame subclass would pass isinstance(x, pd.DataFrame) but fail type(x) == pd.DataFrame.
The Testing Pyramid
Most tests should be unit tests because they are fast (milliseconds), isolated, and easy to debug when they fail. Integration tests are slower and cover the interaction between components. End-to-end tests are the slowest and fewest, covering complete user-facing workflows.
/\
/ \
/E2E \ few; full system, slow
/------\
/ Integ \ some; multi-component tests
/----------\
/ Unit \ many; one function, fast
/--------------\
A rough split for most projects: 70% unit tests, 25% integration tests, 5% end-to-end tests.
For any given function, the minimum test coverage is: one happy-path input, one edge-case input (zero, empty, boundary), and one error-case input (wrong type, out of range). Add side-effect checks if the function writes to disk or a database, and a benchmark if it sits on a performance-critical path.
Quick Reference
| Task | pytest | unittest |
|---|---|---|
| Define a test | def test_x(): | class TestX(unittest.TestCase): + def test_x(self): |
| Assert equal | assert a == b | self.assertEqual(a, b) |
| Assert truthy | assert x | self.assertTrue(x) |
| Assert raises | with pytest.raises(E): | with self.assertRaises(E): |
| Setup per test | @pytest.fixture | def setUp(self): |
| Teardown per test | yield then cleanup code | def tearDown(self): |
| Skip always | @pytest.mark.skip | @unittest.skip |
| Skip conditionally | @pytest.mark.skipif(cond, reason=) | @unittest.skipIf(cond, reason) |
| Expected failure | @pytest.mark.xfail | @unittest.expectedFailure |
| Run all tests | pytest | python -m unittest discover |
| Verbose | pytest -v | python -m unittest -v |
| Keyword filter | pytest -k "name" | python -m unittest -k "name" |
| Stop on first failure | pytest -x | python -m unittest -f |
| Benchmark | benchmark(func, *args) | not available |
See you soon.
[…] Python Testing with pytest and unittest […]
[…] Python Testing with pytest and unittest […]