Turning Your Python Code into a Real Package

This guide explains transforming scripts into structured Python packages, covering necessary steps like documentation, testing, and publication to ensure easy sharing and installation via pip.

You have written a useful function. Maybe several. Right now they live in a single .py file, and sharing them means emailing the file around or copying it from project to project. The moment you improve something, everyone else is stuck on the old version. Turning that code into a proper package fixes all of this. A package is the difference between a recipe scrawled on an index card and a published cookbook with a table of contents, a version number, and a publisher, so that anyone in the world can install the latest copy with a single command.

This guide walks through that whole journey: structuring a package, importing within it, documenting it, making it installable, testing it, checking its style, and finally publishing it so others can pip install your work.

From a Script to a Package

A script is a single .py file you run directly. A package is a folder containing a special file called __init__.py plus your modules. That __init__.py is what tells Python “this folder is an importable package, treat it as one unit.” The file can be completely empty; its mere presence is the signal.

A minimal package looks like this:

my_project/
├── reviewscan/ ← the package (folder)
│ ├── __init__.py ← makes it a package
│ └── reviewscan.py ← your module (the .py with your functions)
└── product-feedback.txt ← a data file

Think of a folder of loose .py files as a pile of papers. Python does not automatically treat a folder as importable; it needs a claim ticket saying “this is a real package.” __init__.py is that ticket.

Real packages grow beyond one module and organise themselves the way a library organises floors and departments:

unitbridge/ ← root package
├── __init__.py
├── utils.py ← shared utilities (top level)
├── length/ ← sub-package
│ ├── __init__.py
│ ├── core.py ← internal functions
│ └── api.py ← public-facing functions
└── weight/ ← sub-package
├── __init__.py
├── core.py
└── api.py

Here unitbridge is the library building, and length/ and weight/ are floors. Each floor has a back office, core.py, where the messy internal logic lives, and a front desk, api.py, where users come to make requests. utils.py at the top is a shared supply closet both floors can use. This separation matters because it lets you change the internal logic freely without breaking what users see at the front desk. By convention, core.py holds internal logic and api.py holds what users actually interact with.

Using a package is a matter of importing the function you want:

from reviewscan.reviewscan import tally_words
positive = tally_words('product-feedback.txt', ['good', 'great'])
negative = tally_words('product-feedback.txt', ['bad', 'awful'])
print("{} positive words.".format(positive))
print("{} negative words.".format(negative))

Read that import left to right: the first reviewscan is the package folder, the dot means “go inside,” the second reviewscan is the module file inside it, and import tally_words pulls out just that one function. It looks repetitive only because the package and module happen to share a name. In a package like unitbridge, where the names differ, the structure reads more clearly.

Importing Within Your Package

Inside a package, modules constantly need to borrow from one another, and there is a right way to do it.

When api.py needs functions from core.py in the same folder, you might expect to write from core import .... Resist that. Always write the full path from the top of the package:

# In unitbridge/length/api.py
"""User-facing functions."""
from unitbridge.length.core import (
VALID_UNITS,
inches_to_feet,
inches_to_yards,
)

The reason is that Python runs your code from wherever it was launched, not from inside the package folder. The full path is unambiguous and works no matter where the code is run from; the short path is fragile and breaks the moment someone runs your code from a different directory.

The same rule applies when a sub-package module needs something from higher up, like the shared utils.py:

# In unitbridge/length/api.py
"""User-facing functions."""
from unitbridge.utils import validate_units
from unitbridge.length.core import VALID_UNITS, inches_to_feet, inches_to_yards
def convert(x, from_unit, to_unit):
# Validate units first
validate_units(from_unit, to_unit, VALID_UNITS)
# Convert input to inches first (the base unit)
if from_unit == "in":
inches = x
elif from_unit == "ft":
inches = inches_to_feet(x, reverse=True) # reverse=True → feet to inches
elif from_unit == "yd":
inches = inches_to_yards(x, reverse=True)
# Convert inches to the desired output unit
if to_unit == "in":
value = inches
elif to_unit == "ft":
value = inches_to_feet(inches)
elif to_unit == "yd":
value = inches_to_yards(inches)
return value

This function uses a neat pattern worth calling out: the hub trick. Rather than writing a separate function for every pair of units, in to ft, in to yd, ft to in, ft to yd, and so on, it converts everything to inches first, then from inches to the target. Inches is the central hub, like an airport you transfer through. A foot-to-yard conversion actually happens in two hops, ft to in then in to yd. The result is far less code to maintain, and adding a new unit like miles later means writing only two functions instead of six.

Exposing Functions Through __init__.py

__init__.py is more than a “this is a package” sign. It is also the storefront window. Whatever you import into it becomes visible directly on the package:

# In unitbridge/__init__.py or unitbridge/length/__init__.py
from unitbridge.length.api import convert

Without that line, a user has to know your internal file layout and write unitbridge.length.api.convert(...). With it, they can write the much friendlier unitbridge.length.convert(...):

import unitbridge
result = unitbridge.length.convert(10, 'in', 'yd')
print(result) # 0.277...

This decouples the path users rely on from your internal file structure, so you can rearrange the internals later without breaking anyone’s code.

Documenting Your Package

Code without documentation is like an appliance with no manual: it might work, but nobody will figure out how. Python has a built-in mechanism for this called the docstring, a triple-quoted string placed as the first thing inside a function. Python attaches it automatically, so anyone can read it via help() or see it in their editor’s tooltip.

The most common convention for organising a docstring is the NumPy style: a one-line summary, a Parameters section listing each argument with its type, and a Returns section describing the output:

INCHES_PER_FOOT = 12.0
INCHES_PER_YARD = INCHES_PER_FOOT * 3.0 # 3 feet in a yard
VALID_UNITS = ("in", "ft", "yd")
def inches_to_feet(x, reverse=False):
"""Convert lengths between inches and feet.
Parameters
----------
x : numpy.ndarray
Lengths in inches.
reverse : bool, optional
If true, converts from feet to inches instead of inches to feet.
(Default value = False)
Returns
-------
numpy.ndarray
"""
if reverse:
return x * INCHES_PER_FOOT # feet → inches (multiply)
else:
return x / INCHES_PER_FOOT # inches → feet (divide)

Tools like Sphinx can read these docstrings and auto-generate a polished documentation website, so in a sense the documentation writes itself.

One warning worth absorbing from this example: the original version of these notes described the x parameter as “Lengths in feet,” but the function actually divides x by 12 to produce feet when reverse=False, which means x is in inches. The docstring above corrects that to “Lengths in inches.” It is a small thing, but it is a perfect illustration of how docstrings quietly drift out of sync with the code they describe if you are not careful.

Documentation lives at several levels. A whole module can carry a docstring as the very first line of the file, before any import or definition:

"""Conversions between inches and larger imperial length units."""
INCHES_PER_FOOT = 12.0
...

And api.py might open with a docstring that signals its role as the public interface:

"""User-facing functions."""
from unitbridge.length.core import ...

The pattern repeats at every level: a package docstring in __init__.py, a module docstring at the top of each file, a function docstring inside each definition, and inline comments next to individual lines of code. Each level tells the reader what they are looking at before they have to read the code itself.

Making It Installable with setup.py

To turn your folder into something pip can install, you add a setup.py file. It is the shipping label for your package, telling pip everything it needs: the name, the version, the author, and which folders to include:

from setuptools import setup, find_packages
setup(
author="Your Name",
description="A package for converting imperial lengths and weights.",
name="unitbridge",
packages=find_packages(include=["unitbridge", "unitbridge.*"]),
version="0.1.0",
)

find_packages() walks your project and automatically finds every folder containing an __init__.py. The include= filter matters, because without it find_packages might sweep up your tests folder or other things you do not want shipped. The pattern "unitbridge.*" means “unitbridge and any sub-package of it.”

The version number follows the major.minor.patch convention. You bump the patch number for bug fixes, the minor number for new features that do not break existing code, and the major number when you change something that will break existing users’ code. A 0.x.x version signals the package is still pre-1.0 and things may change.

Most packages depend on other libraries, and setup.py is where you declare them:

from setuptools import setup, find_packages
setup(
author="Your Name",
description="A package for converting imperial lengths and weights.",
name="unitbridge",
packages=find_packages(include=["unitbridge", "unitbridge.*"]),
version="0.1.0",
install_requires=[
'numpy>=1.10', # minimum version
'pandas', # any version
],
python_requires="==3.9.*", # only Python 3.9.x
)

When someone runs pip install unitbridgeinstall_requires triggers a chain reaction: pip installs numpy and pandas first. The version operators read like plain language. >=1.10 means “1.10 or newer,” and ==3.9.* means “any 3.9.x release but not 3.8 or 3.10.” The art is not pinning too strictly, which would stop your package working with newer numpy releases, nor too loosely, which risks a future release breaking your code.

Installing and Managing Your Package

While developing, you install your package in editable mode:

pip install -e .

Normally pip install copies your code into Python’s site-packages folder, which means edits to your source are not reflected until you reinstall, painful when you are actively working. The -e flag instead creates a link from site-packages back to your source folder, so Python reads your live code directly. Edit a function, save, run again, and your changes are already there. The . simply means “install whatever the setup.py in this directory describes.”

There is an important distinction between two kinds of dependency. Users of your package only need the libraries your code actually calls at runtime, which belong in install_requires. Developers working on the package additionally need tools like pytest and flake8, and forcing every user to install those would be needless bloat. Those development-only tools go in a separate requirements.txt file, which you can generate from your current environment:

pip freeze > requirements.txt

pip freeze snapshots every installed package with its exact version, and the > redirects that list into a file rather than printing it. Another developer can then recreate your exact environment with pip install -r requirements.txt. The file ends up looking like this:

numpy==1.24.0
pandas==2.0.1
pytest==7.2.0
flake8==6.0.0

The Supporting Files

Two extra files round out a professional package. The first is the README, the front page shown on GitHub and on PyPI. Most people decide whether to use your package in about thirty seconds based on it, so lead with what it is, why it exists, and a tiny working example:

# unitbridge
A package for converting between imperial unit lengths and weights.
### Features
- Convert lengths between miles, yards, feet and inches.
- Convert weights between hundredweight, stone, pounds and ounces.
### Usage
```python
import unitbridge
# Convert 500 yards to feet
unitbridge.length.convert(500, from_unit='yd', to_unit='ft') # returns 1500.0
# Convert 100 ounces to pounds
unitbridge.weight.convert(100, from_unit='oz', to_unit='lb') # returns 6.25
```

Think of the README as a trailer, not a manual. If people are interested, they will dig into the full documentation.

The second file is MANIFEST.in, which lists non-code files to include when you distribute the package:

include README.md
include LICENSE

When you build a distribution, setuptools bundles your .py files automatically but ignores everything else, so your README and LICENSE would be silently left out without this override. The LICENSE file is especially important, because without one, legally nobody is allowed to use your code at all.

Building and Publishing

With everything in place, you build distributable archives:

python3 setup.py sdist bdist_wheel

There are two formats. The source distribution, sdist, is your raw .py files compressed, which pip downloads and builds on the user’s machine. The wheel, bdist_wheel, is a pre-built package ready to drop straight into site-packages, which installs much faster. Shipping both is best practice. This creates a dist/ folder containing a .tar.gz source archive and a .whl wheel.

A note for new projects in 2026: the more modern build command is python -m build, which runs in a cleaner, isolated build environment. The older setup.py sdist bdist_wheel still works, but python -m build is the current recommendation.

Finally, you upload to PyPI, the central registry where pip install looks for packages:

twine upload dist/*

twine handles authentication and avoids the security pitfalls of older upload methods. Once your files are up, anyone in the world can install your package. A strong tip: practise on TestPyPI first with twine upload --repository testpypi dist/*. It is a sandbox version of the real index, so you can rehearse without accidentally publishing a broken release to the world.

Testing with pytest

A package without tests is a package that breaks silently. The convention is to mirror your source structure inside a tests/folder that lives outside the package, so it does not ship to users:

unitbridge/
├── length/
│ └── core.py ← source
└── tests/
└── length/
└── test_core.py ← tests

A test is just a function that calls your code and uses assert to check the result:

from unitbridge.length.core import inches_to_feet, inches_to_yards
def test_inches_to_feet():
# Normal direction: 12 inches → 1 foot
assert inches_to_feet(12) == 1.0
# Reverse direction: 2.5 feet → 30 inches
assert inches_to_feet(2.5, reverse=True) == 30.0

assert says “this had better be true; if it is not, fail loudly.” If inches_to_feet(12) ever stops returning 1.0 because someone broke the function, the test fails immediately, rather than three months later when a user reports the bug. Each assertion is one specific promise about how the code behaves.

One caveat about floating-point numbers: the assertion inches_to_feet(2.5, reverse=True) == 30.0 happens to work because 2.5 times 12 is exact in binary. But for general floating-point comparisons, prefer pytest.approx, as in assert result == pytest.approx(30.0), otherwise tiny rounding differences like 30.0000000001 will fail a test even though the maths is correct.

For pytest to discover your tests, the file must be named test_*.py or *_test.py, the function name must start with test_, and you check expectations with assert. Running them is then trivial:

pytest # run all tests
pytest tests/ # run tests in a specific folder
pytest -v # verbose output, showing each test name

Typing pytest walks the directory tree, finds every test file and function, runs them all, and prints a green dot for each pass and a red traceback for each failure, with no configuration needed for the basic case.

Testing Across Python Versions with tox

Your code might run perfectly on the Python version on your machine and crash on the version a user has. tox solves this by creating a separate, isolated environment for each Python version you specify, installing your package and its test dependencies in each, and running the tests across all of them. You configure it in a tox.ini file:

[tox]
envlist = py39, py310, py311, py312

[testenv]

deps = pytest commands = pytest

It is worth noting that the original version of these notes listed py27 among the environments. In 2026, Python 2.7 is long dead, having reached end of life in 2020, so nobody should be testing against it. A realistic modern environment list is the one above. Running tox then sets up each environment, runs the tests in each, and reports pass or fail per version.

Where tox tests against multiple versions, python_requires in setup.py enforces the conclusion. Setting python_requires="==3.9.*"tells pip to refuse installation on any Python that does not match, so a user on an unsupported version gets a clear error instead of a mysterious runtime crash.

Checking Style with flake8

flake8 checks your code against PEP8, Python’s official style guide. PEP8 is a community agreement on small decisions like four-space indentation, blank lines between functions, maximum line length, and naming conventions. None of these change what the code does, but they make every Python codebase look consistent, so you do not waste mental energy adapting to a new style on every project. Here is code that satisfies it:

"""Main module."""
def absolute_value(num):
"""Return the absolute value of the number."""
if num >= 0:
return num
else:
return -num

flake8 is a critic, not a reformatter; it lists every place your code breaks a rule, by line number, but does not fix anything itself (that is the job of a tool like black). One small correction to the original notes: they said PEP8 requires a blank line before a function definition, but PEP8 actually requires two blank lines before a top-level function, which is what the example shows. You run it against a file or a whole package:

flake8 my_module.py # check one file
flake8 unitbridge/ # recursively check the whole package

Occasionally a PEP8 rule would genuinely make code harder to read, and you can suppress it on a specific line:

import numpy as np
def calculate_hypotenuse(side1, side2):
"""Calculate the length of the hypotenuse."""
l = np.sqrt(side1**2 + side2**2) # noqa: E741
return l

flake8 normally objects to l as a variable name, because a lowercase L is easily confused with the digit 1, which is rule E741. The # noqa: E741 comment is a polite permission slip saying “I know about this rule and am choosing to ignore it on this line.” Use it sparingly, since overusing it defeats the purpose of having a linter at all.

The Full Workflow

Pulling it all together, developing a package follows a clear sequence. You write your code as functions, create the package folder with its __init__.py, and organise into internal core.py and public api.py modules. You wire up imports using full paths from the root, and expose the friendly functions through __init__.py. You document everything in NumPy style, write setup.py with the name, version, packages, and dependencies, and install locally with pip install -e .. You add a README and MANIFEST.in, write tests in a mirrored tests/ folder, and run them with pytest. You check style with flake8, confirm compatibility across versions with tox, build with python -m build, and finally publish with twine upload dist/*.

Conclusion

A Python package is what turns code that works on your machine into code the world can install and rely on. The structure is simple at heart: a folder with an __init__.py, internal logic separated from a public interface, and full import paths that never break. Around that core sits the supporting cast that makes a package trustworthy: docstrings so people can understand it, a setup.py so pip can install it, tests so changes cannot silently break it, tox and python_requires so it works on the right Python versions, and flake8 so it stays clean. Get into the habit of building these in, and sharing your work becomes a single command rather than an email and an apology.

See you soon.

View Comments (2)

Leave a Reply

Prev Next

Subscribe to My Newsletter

Subscribe to my email newsletter to get the latest posts delivered right to your email. Pure inspiration, zero spam.

Discover more from Datalad - Data Science and ML

Subscribe now to keep reading and get access to the full archive.

Continue reading