The Hugging Face article explains the landscape: a hub of pretrained models and datasets, the pipeline class that makes them one-liners, and the Auto classes underneath when you need control. This workbook makes each idea runnable. You will classify sentiment, generate text, search the Hub from code, run zero-shot labels, summarise, load and filter a dataset, look inside a tokenizer, assemble a pipeline by hand, and finish by asking questions of a PDF. The small models used here run comfortably on an ordinary laptop CPU.
1. Your first pipeline: sentiment analysis
The pipeline class hides every step, tokenisation, model, and decoding, behind one call. Give it a task name and it picks a sensible default model.
from transformers import pipelineclassifier = pipeline("sentiment-analysis")reviews = [ "The delivery was quick and the keyboard feels fantastic.", "Two weeks late and the box arrived crushed."]for review in reviews: result = classifier(review)[0] print(f"{result['label']} ({result['score']:.3f}) - {review}")# POSITIVE (1.000) - The delivery was quick and the keyboard feels fantastic.# NEGATIVE (0.999) - Two weeks late and the box arrived crushed.
Three lines of setup and you have a working sentiment model. The returned score is the model’s confidence, and printing it alongside the label is a good habit: a 0.55 is a shrug, a 0.999 is a verdict.
2. Text generation with GPT-2
Swap the task name and the same interface generates text. max_new_tokens caps the length and num_return_sequences asks for several continuations of the same prompt.
from transformers import pipelinegenerator = pipeline("text-generation", model="gpt2")outputs = generator( "The most useful skill for a data analyst is", max_new_tokens=30, num_return_sequences=2)for i, out in enumerate(outputs, 1): print(f"--- Continuation {i} ---") print(out["generated_text"])
GPT-2 is small and dated, which is exactly why it is ideal for learning: it downloads fast and runs anywhere. The structure you practise here, prompt in, generated_text out, is identical for far larger models.
3. Searching the Hub from code
The Hub is not just a website. The huggingface_hub library lets you search its models programmatically, which is how you find candidates without leaving your script.
from huggingface_hub import HfApiapi = HfApi()models = api.list_models( task="summarization", # filter by task sort="downloads", # most-used first limit=5)for model in models: print(f"{model.id} | downloads: {model.downloads}")
The result is the five most-downloaded summarisation models with their ids, and any id can be dropped straight into a pipeline(model=...) call. Download counts are a useful proxy for “battle-tested” when choosing between similar models.
4. Zero-shot classification
Zero-shot models classify text into categories they were never trained on. You supply the candidate labels at call time, which makes this the fastest way to prototype a classifier with no training data at all.
from transformers import pipelineclassifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")ticket = "My invoice shows the wrong VAT amount for last month."departments = ["billing", "technical support", "sales", "complaints"]result = classifier(ticket, candidate_labels=departments)for label, score in zip(result["labels"], result["scores"]): print(f"{label}: {score:.3f}")# billing: 0.812# complaints: 0.102 ...
The model ranks every label by how well it fits the text. Change the departments list and rerun; there is no retraining step, which is what “zero-shot” means and why it is so useful for routing, tagging, and triage prototypes.
5. Summarisation with length constraints
Summarisation pipelines accept min_new_tokens and max_new_tokens to box in the output length. This one condenses a paragraph to a couple of sentences.
from transformers import pipelinesummarizer = pipeline("summarization", model="facebook/bart-large-cnn")report = """Quarterly traffic grew 18 percent, driven mainly by organic search after thesite migration completed in April. Paid channels stayed flat, with socialdeclining slightly as budgets moved to search. Conversion rate improved from2.1 to 2.6 percent, which the team attributes to the new checkout flow.Average order value was unchanged. The main risk flagged for next quarter istracking loss from the upcoming consent banner changes."""summary = summarizer(report, min_new_tokens=20, max_new_tokens=50)print(summary[0]["summary_text"])
BART-large-CNN is an abstractive summariser, meaning it writes new sentences rather than copying existing ones. The length constraints matter: without them, summaries drift long or truncate awkwardly, and tuning those two numbers is most of the craft.
6. Loading and filtering a dataset
The datasets library gives you the same one-liner convenience for data. Load a dataset from the Hub, then shape it with .filter and .select.
from datasets import load_datasetreviews = load_dataset("imdb", split="test")print(reviews) # 25,000 labelled movie reviews# keep only short reviewsshort = reviews.filter(lambda row: len(row["text"]) < 500)print(f"Short reviews: {short.num_rows}")# take the first three for a looksample = short.select(range(3))for row in sample: print(row["label"], row["text"][:80], "...")
filter takes a function that returns True for rows to keep, and select slices by position. Both return a new dataset rather than modifying the original, so you can chain them into a small preprocessing recipe.
7. Inside the tokenizer
Models never see words; they see token ids. AutoTokenizer loads the exact tokenizer a model was trained with, and inspecting its output demystifies what a pipeline does before the model runs.
from transformers import AutoTokenizertokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")text = "Tokenizers split text into pieces."tokens = tokenizer.tokenize(text)ids = tokenizer.encode(text)print(tokens) # ['token', '##izer', '##s', 'split', 'text', 'into', 'pieces', '.']print(ids) # [101, 19204, 17629, 2015, 3975, 3793, 2046, 4109, 1012, 102]print(tokenizer.decode(ids)) # [CLS] tokenizers split text into pieces. [SEP]
Notice tokenizers became three pieces, with ## marking continuations, and that special [CLS] and [SEP] markers were added around the sentence. Every odd model behaviour you will ever debug, truncation, weird splits, length limits, starts here.
8. Assembling a pipeline by hand with Auto classes
A pipeline is just a tokenizer plus a model plus some post-processing. Building one manually shows exactly what the convenience wrapper does, and it is the pattern you need when you want logits or custom behaviour.
import torchfrom transformers import AutoTokenizer, AutoModelForSequenceClassificationcheckpoint = "distilbert-base-uncased-finetuned-sst-2-english"tokenizer = AutoTokenizer.from_pretrained(checkpoint)model = AutoModelForSequenceClassification.from_pretrained(checkpoint)inputs = tokenizer("A genuinely delightful little film.", return_tensors="pt")with torch.no_grad(): logits = model(**inputs).logitsprobs = torch.softmax(logits, dim=-1)[0]labels = model.config.id2labelfor i, p in enumerate(probs): print(f"{labels[i]}: {p:.3f}")# NEGATIVE: 0.000# POSITIVE: 1.000
Tokenise, run the model, softmax the logits, map ids to labels: that is the whole pipeline, spelled out. model.config.id2label is the lookup that turns position 1 into “POSITIVE”, and having the raw probabilities lets you set your own thresholds.
9. Question answering over a context
Extractive QA models find the answer span inside a passage you provide. Give the pipeline a question and a context, and it returns the exact substring plus a confidence.
from transformers import pipelineqa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")context = """The onboarding project kicked off on 3 February and is led by Priya Shah.The first milestone, the data audit, is due on 28 March, with the fullrollout planned for July. Budget approval sits with the finance committee."""questions = [ "Who leads the onboarding project?", "When is the data audit due?",]for q in questions: result = qa(question=q, context=context) print(f"{q} -> {result['answer']} ({result['score']:.2f})")# Who leads the onboarding project? -> Priya Shah (0.98)# When is the data audit due? -> 28 March (0.97)
The model does not generate an answer; it points at one, returning a span lifted verbatim from the context. That makes extractive QA reliable and easy to verify, since the answer is always traceable to the source text.
10. Putting it together: ask questions of a PDF
The final example combines a PDF parser with the QA pipeline from Example 9: extract the text, then query it. This is the skeleton of every “chat with your document” tool.
from pypdf import PdfReaderfrom transformers import pipeline# 1. extract text from the PDFreader = PdfReader("report.pdf") # any text-based PDF you havedocument_text = ""for page in reader.pages: document_text += page.extract_text()# 2. feed it to a QA pipeline as the contextqa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")question = "What was the total revenue?"result = qa(question=question, context=document_text)print(f"Answer: {result['answer']}")print(f"Confidence: {result['score']:.2f}")
Point it at any text-based PDF and adjust the question to match. Two honest caveats: scanned PDFs need OCR first since extract_text only reads embedded text, and very long documents exceed the model’s context window, which is when you would chunk the text and query each chunk. Both are natural next steps once this skeleton works.
Work through these and you will have used the whole toolkit from the article: pipelines for six different tasks, the Hub API, datasets with filtering, tokenizers, and the Auto classes that let you build it all by hand. The selection habit worth keeping is the one the guide closes on: start with a pipeline and a small default model, prove the task works, and only then trade up to bigger models or manual control when the results, or your requirements, demand it.
See you soon.
[…] Working with Hugging Face: 10 Code-Along Examples […]
[…] Working with Hugging Face: 10 Code-Along Examples […]