Every survey is a chain of decisions, and an error can creep in at any link. Before you reach for a single statistical technique, you have to understand how the data in front of you was gathered, because the soundest analysis built on a badly designed sample is still wrong. This matters as much for the policymaker acting on the results as for the academic producing them. Sampling design is the discipline of making those decisions deliberately, and this article walks through the whole landscape: why we sample at all, the families of sampling methods, the errors that threaten a survey, and the practical choices of contact method and quality control.
Why Sample at All
The alternative to sampling is a census, which measures every unit in a population of size N. A census carries no sampling error, since nothing is left out, but it is expensive and slow, and ironically it can suffer worse non-sampling errors when budget pressure forces the use of cheaper, less reliable interviewers spread thinly across the whole population. A sample instead selects a much smaller number of units.
Taking a sample does introduce sampling error, because you are now generalising from a part to the whole, but it lets you concentrate your resources on quality: better-trained interviewers, more careful follow-up, tighter supervision. The result is that non-sampling errors often fall, and the modest sampling error you accept is a price worth paying.
A related distinction concerns where data comes from. Primary data is collected fresh for a specific purpose, while secondary data already exists, gathered by someone else or for a different reason. If suitable secondary data exists, there is no need to collect anything new, which saves both time and money. Samples underpin work across many fields, from demography studying births and deaths, to government measuring crime and employment, to market research probing consumer preferences, to political science tracking voting intention.
The Two Families of Sampling
The goal in every case is a sample that is representative of the population, and all techniques fall into one of two families. Non-probability, or non-random, sampling does not give every unit a known chance of selection. Probability, or random, sampling does. That single difference has large consequences, because it determines whether you can quantify your uncertainty at all.
Non-Probability Sampling
Non-probability methods need no sampling frame, which is their main attraction, but they come with a serious limitation: sampling errors cannot be quantified. Some units have no chance of selection at all, and those that can be selected have an unknown probability, so standard errors are not measurable. Without measurable standard errors, confidence intervals and hypothesis tests simply do not apply. You get an estimate, but no honest statement of how uncertain it is.
Three approaches dominate this family. Convenience sampling takes whoever happens to be available, such as the people passing through a shopping centre on a weekday. It is quick and cheap, but unlikely to represent the population, since on a school day that mall would be missing children and working professionals entirely. Judgemental sampling hands the selection to an expert, who decides who to include, as when a research firm screens callers with expertly designed questions to assemble a focus group matching a target profile. Quota sampling sets target counts for specified characteristics like age, gender, and social class, which requires knowing the approximate distribution of those characteristics in the population.
Quota sampling deserves a closer look, because it is the workhorse of commercial research. The interviewer fills a target number for each control characteristic, aiming to replicate the population’s distribution in the sample, and crucially no sampling frame is required. You would reach for it when speed matters, since filling a numerical target is faster than tracking down specific named individuals, when no sampling frame is available, when cost must be kept down, or when a rough estimate is all the decision requires. A drinks company testing whether people broadly like a new ice cream flavour before committing to full production needs only a representative quota, not statistical precision.
Quota sampling has a notorious weakness, though. Non-respondents are usually not recorded, and that omission can mislead badly, because you never learn who declined or why. Best practice is to log non-response even in a quota sample. There is also a tension built into the method: adding more quota controls improves representativeness, but each new control erodes the speed and cost advantages that made quota sampling attractive in the first place. As a concrete case, surveying bus and coach drivers across many companies is awkward because some firms will not release employee lists under data protection rules, so a quota sample taken at depots and route endpoints at different times of day is the practical route.
Probability Sampling
Probability sampling gives every unit a known, non-zero probability of selection, and it always uses randomisation. The payoff is that standard errors can be computed, which unlocks confidence intervals and hypothesis tests, the entire apparatus of statistical inference. The cost is that it requires a sampling frame, a list of the units in the population.
The simplest version is simple random sampling, in which every unit has an equal, known, non-zero probability of selection. It produces unbiased estimates and is easy to understand, but it is not the most accurate method available, since it tends to have larger standard errors than stratified sampling.
Systematic random sampling selects one unit at random from the first x units and then takes every xth unit after that, a so-called one-in-x sample, where the interval is the population divided by the sample size.
It is easy to implement, but it carries one specific danger: periodicity in the sampling frame. A one-in-seven sample of daily sales data would land on the same day of the week every time, perhaps every Monday, and so would badly misjudge the real variation across the week. Shifting to a one-in-eight interval cycles through different days and sidesteps the trap. The lesson generalises: always inspect the frame for hidden cycles before choosing an interval.
Stratified random sampling divides the population into strata, groups that are internally similar but distinct from one another, and then takes a simple random sample from each stratum. Because every stratum is sampled, representation across the population is guaranteed, and the standard errors come out smaller than under simple random sampling. The catch is that the stratifying factors must be relevant to the survey’s purpose, such as age group, gender, or year of study. Surveying student satisfaction with a plain simple random sample could, by bad luck, draw mostly first-years, whereas stratifying by year of study ensures every cohort appears.
Cluster sampling works the other way around. It divides the population into clusters, where ideally each cluster mirrors the full variation of the population, and then selects some of those clusters by simple random sampling. In a one-stage design you survey every unit in the chosen clusters, while in a two-stage design you take a simple random sample within each chosen cluster, which makes it a multistage design. Its purpose is to cut cost, especially in face-to-face interviewing, where an interviewer can work a single local area rather than crisscrossing the whole country. It is less efficient than stratified sampling, with larger standard errors, but it shines when no national frame exists. Individual universities, for instance, hold lists of their own students while no national student database exists, so you cluster by university, sample some universities, and then sample students within them. For telephone or postal surveys, where travel is not a factor, clustering matters less unless the clusters themselves are of interest.
Multistage sampling formalises this layered approach, with selection happening at two or more successive stages: large compound primary units are sampled first, then smaller secondary units within them, and so on as needed. A typical large UK government survey might first divide the country into strata by industrial region, then sample local-area clusters from each region, then obtain a list such as the electoral register within each local area and take a simple random sample from it.
It helps to see how stratified and cluster sampling relate. The difference is which groups you select and how much of each you take.
| Method | Which groups selected | Which units within groups |
|---|---|---|
| Stratified | All strata | Some units from each |
| One-stage cluster | Some clusters | All units in selected clusters |
| Two-stage cluster | Some clusters | Some units in selected clusters |
Seen this way, stratified sampling is the extreme case of two-stage cluster sampling in which every group is selected at the first stage.
Quota or Random? Three Judgement Calls
Choosing between the families is often a matter of weighing the frame, the cost, and the required accuracy. Airline pilots surveyed by their own employer about holiday entitlement are easy: the company holds a personnel list, so a frame exists, random sampling is accurate and appropriate, and pilots contacted through official channels are likelier to take it seriously. Potential tourists surveyed for a holiday brochure are harder, because the relevant lists are scattered across competing companies that will not share them, and sampling the general population by address would sweep in many people with no travel plans, so a quota sample that lets interviewers quickly find relevant respondents is the practical choice. Household expenditure for tax policy sits at the opposite pole: accuracy is essential in a policy-critical area, so a random sample from national address or voter lists is required.
The Errors That Threaten a Survey
Errors split into two broad kinds. Sampling error arises simply because you took a sample rather than a census, reflecting the random variation of the sampling scheme, and for probability samples it can be estimated, which is what makes inference possible. Non-sampling error arises from failures in the scheme itself and is hard to quantify. It has two main subtypes. Selection bias comes from the sampling frame not matching the target population, from the frame not being followed strictly, or from non-response. Response bias means the measurements themselves are wrong, whether through ambiguous question wording, misunderstanding by respondents, the sensitivity of the subject, interviewer bias from leading questions or poor training, or even the loss of whole batches of questionnaires.
Testing the Design Before and After
Two kinds of auxiliary survey protect a study. A pilot survey is a small trial run before the main event, used both to estimate standard errors for different question types, which feeds back into the sampling design, and to surface non-sampling problems, such as whether respondents understand the questionnaire and whether interviewers are performing well. A post-enumeration survey runs after the fact, re-interviewing a subsample of the original respondents to check that questions were understood and answers recorded correctly, which is especially valuable when the questions are technical.
Non-Response and Response Bias
Non-response and response bias can intrude at every stage, in both random and quota samples, whatever the method of contact. Part of the problem starts in the frame itself through coverage bias. A frame listing householders will miss people who have just moved in, and a frame of those aged eighteen and over will underrepresent younger adults if they are careless about registering.
Non-response itself comes in recognisable forms: people not at home because of work or holidays, refusals from those who object to the subject or the sponsor, incapacity through illness or language difficulty, people not found at vacant or abandoned addresses, and schedules lost after collection. The instinctive fix of simply increasing the sample size does not help, because it only gathers more data from the willing while leaving the reluctant just as absent. What actually helps is improving collection procedures and interviewer training, following up non-respondents with callbacks or a different contact method, using a proxy interview by substituting an available unit, or offering an incentive such as a cash payment or prize draw entry.
Response error is subtler and harder to detect, because a confident, clear-sounding answer may rest on a misunderstanding or a deliberate untruth. The UK Family Expenditure Survey is the classic illustration: alcohol consumption is understated by up to half compared with known sales figures from the tax authority. The sources are split between the interviewer’s role, through their characteristics, opinions, leading questions, or misrecording, and the respondent’s role, through lack of knowledge, forgetting, or reluctance on a sensitive topic. The controls are better recruitment, training, and supervision of interviewers, along with re-interviewing, consistency checks, and using more interviewers so no single one’s bias dominates.
Choosing a Method of Contact
How you reach respondents shapes both cost and bias, and each method trades off against the others.
| Method | Strengths | Weaknesses |
|---|---|---|
| Face-to-face | Good for personal questions, allows probing, can explain complex ideas and show samples | Very expensive, hard to gather detailed records on the spot |
| Telephone | Reaches large numbers quickly, easy to supervise centrally | Excludes those without a phone, cannot show samples, misses mobile-only respondents not in directories |
| Self-completion (postal, email, online) | Reaches most people, lets respondents look up records like income or tax | High non-response, later questions can influence earlier answers, no control over who actually completes it |
These methods can also be combined. The UK Family Expenditure Survey pairs three interviewer visits over a fortnight, which lift the response rate and let the interviewer explain the details, with a self-completion expenditure diary that captures the day-to-day spending no one could recall in a single interview.
The right choice depends on the topic. A survey of shopping patterns suits face-to-face contact, because a postal survey on such a low-salience subject would draw few replies and a telephone survey would underrepresent the elderly and the poor. A survey of businesspeople’s attitudes to new office equipment suits the telephone, since they all have phones, the questions are simple, and a scheduled call disrupts a working day far less than a visit. A government survey of teachers’ pay and conditions suits self-completion, because respondents cannot recall exact pay and tax figures unprompted, motivation is high given a possible pay rise, and teachers are comfortable filling in forms without an interviewer present.
10 Exercises for you to practice here: https://datalad.co.uk/sampling-design-workbook-10-exercises-with-full-solutions/
Conclusion
Sampling design is the set of choices that decide whether your data can bear the weight you put on it. Sample rather than take a census to focus resources on quality, and pick the family that fits your situation: non-probability methods like quota sampling when no frame exists and speed or cost dominate, accepting that you forfeit measurable uncertainty, or probability methods when you have a frame and need real inference. Within probability sampling, simple random sampling is the baseline, systematic sampling is convenient but watch for periodicity, stratified sampling buys accuracy when relevant subgroups exist, and cluster and multistage designs cut cost when fieldwork is geographically spread. Then guard the design against non-sampling error with pilots, follow-ups, trained interviewers, and a contact method matched to the topic. The statistics come later, and they are only ever as trustworthy as the sample beneath them.
[…] Sampling Design […]
[…] Sampling Design […]