The aggregates and joins article covers the two workhorses of analytical SQL: collapsing many rows into summary numbers with COUNT, SUM and AVG, and stitching tables together with the JOIN family. This workbook walks both, then puts them together and shows the one trap that quietly corrupts totals. Run Example 1 first; the rest depend on it.
I recommend you write the code yourself, muscle memory is real.
1. Create the sample tables
Run this once. Three small tables, with deliberately tiny numbers so you can verify every result by eye. Note that Dan never buys anything and the Webcam is never sold, which makes the join examples meaningful.
-- Run once. Change `scratch` to any dataset you can write to.CREATE OR REPLACE TABLE scratch.customers ASSELECT * FROM UNNEST([ STRUCT(1 AS customer_id, 'Ava' AS name, 'UK' AS country), STRUCT(2, 'Ben', 'UK'), STRUCT(3, 'Chloe', 'US'), STRUCT(4, 'Dan', 'US') -- never makes a purchase]);CREATE OR REPLACE TABLE scratch.products ASSELECT * FROM UNNEST([ STRUCT(101 AS product_id, 'Keyboard' AS name, 'Accessories' AS category, 80 AS price), STRUCT(102, 'Monitor', 'Displays', 300), STRUCT(103, 'Mouse', 'Accessories', 25), STRUCT(104, 'Webcam', 'Accessories', 60) -- never purchased]);CREATE OR REPLACE TABLE scratch.purchases ASSELECT * FROM UNNEST([ STRUCT(1 AS purchase_id, 1 AS customer_id, 102 AS product_id, 1 AS quantity, 300 AS amount), STRUCT(2, 1, 101, 1, 80), STRUCT(3, 1, 103, 2, 50), STRUCT(4, 2, 103, 1, 25), STRUCT(5, 3, 102, 4, 1200), STRUCT(6, 3, 101, 1, 80)]);SELECT * FROM scratch.purchases ORDER BY purchase_id;
For reference: there are six purchases across three buyers, total spend is Ava 430, Ben 25, Chloe 1280, Dan nothing.
2. COUNT, in its three forms
COUNT answers “how many,” but the form matters. COUNT(*) counts every row, COUNT(column) counts only rows where that column is not null, and COUNT(DISTINCT column) counts unique values.
SELECT COUNT(*) AS total_purchases, -- every row COUNT(DISTINCT customer_id) AS buyers, -- unique customers who bought COUNT(quantity) AS rows_with_quantity -- non-null quantities onlyFROM scratch.purchases;
You get 6 purchases from 3 distinct buyers. The distinction between COUNT(*) and COUNT(column) becomes important the moment a column contains nulls, which is exactly what the LEFT JOIN in Example 7 will produce.
3. GROUP BY with COUNT
On its own, an aggregate collapses the whole table to one number. GROUP BY splits the table into groups first, so you get one result per group. Here it counts purchases per customer.
SELECT customer_id, COUNT(*) AS purchase_countFROM scratch.purchasesGROUP BY customer_idORDER BY customer_id;
The result is customer 1 with 3, customer 2 with 1, and customer 3 with 2. Read GROUP BY customer_id as “make one bucket per customer,” and the COUNT then applies inside each bucket rather than across the whole table.
4. SUM, and the GROUP BY rule
SUM totals a numeric column, skipping nulls. Grouped, it gives a total per group, the classic “spend per customer” query.
SELECT customer_id, SUM(amount) AS total_spendFROM scratch.purchasesGROUP BY customer_idORDER BY total_spend DESC;
Chloe leads at 1280. There is one rule to internalise here: every column in the SELECT that is not wrapped in an aggregate must appear in the GROUP BY. customer_id is not aggregated, so it has to be grouped. Break that rule and BigQuery rejects the query, which is its way of stopping an ambiguous result.
5. AVG, MIN and MAX together
The remaining aggregates follow the same pattern and can be combined in one query. Here we summarise prices per product category.
SELECT category, AVG(price) AS avg_price, MIN(price) AS cheapest, MAX(price) AS dearest, COUNT(*) AS productsFROM scratch.productsGROUP BY categoryORDER BY avg_price DESC;
Displays averages 300 from a single product, while Accessories averages 55 across three. Like SUM, all of these skip nulls automatically, so a missing price would be ignored rather than treated as zero.
6. INNER JOIN
Joins combine rows from two tables that share a key. An INNER JOIN keeps only rows that match on both sides. Here it attaches each purchase to its product so you can see names instead of ids.
SELECT pu.purchase_id, pr.name AS product, pr.category, pu.amountFROM scratch.purchases AS puINNER JOIN scratch.products AS pr ON pu.product_id = pr.product_id -- the join conditionORDER BY pu.purchase_id;
All six purchases match a product, so you get six rows. The ON clause states which columns must line up, and the short aliases pu and pr keep the query readable. The Webcam never appears, because no purchase references it and an inner join drops non-matching rows.
7. LEFT JOIN, with IFNULL
A LEFT JOIN keeps every row from the left table even when there is no match on the right, filling the gaps with nulls. This is how you include customers who have never bought anything.
SELECT c.name, COUNT(p.purchase_id) AS purchase_count, -- COUNT ignores the nulls IFNULL(SUM(p.amount), 0) AS total_spend -- turn a null total into 0FROM scratch.customers AS cLEFT JOIN scratch.purchases AS p ON p.customer_id = c.customer_idGROUP BY c.nameORDER BY total_spend DESC;
Dan appears with a count of 0 and a spend of 0. Two things are working together: COUNT(p.purchase_id) returns 0 for Dan because COUNT ignores the null from the unmatched join, and IFNULL converts his null SUM into a clean 0. An INNER JOIN here would have dropped Dan entirely.
8. RIGHT JOIN and FULL JOIN
A RIGHT JOIN keeps every row from the right table, and a FULL JOIN keeps every row from both. Here is a FULL JOIN that surfaces products that have never sold.
SELECT pr.name AS product, COUNT(pu.purchase_id) AS times_soldFROM scratch.products AS prFULL JOIN scratch.purchases AS pu ON pu.product_id = pr.product_idGROUP BY pr.nameORDER BY times_sold;
The Webcam shows up with 0, since FULL JOIN retains it despite having no purchases. A RIGHT JOIN is just a LEFT JOIN with the tables swapped, so most teams standardise on LEFT for readability and rarely write RIGHT at all. Reach for FULL only when you genuinely need unmatched rows from both sides.
9. Join, then aggregate
The real power comes from joining related tables and then summarising the combined result. Here we join purchases to products and total revenue per category.
SELECT pr.category, SUM(pu.amount) AS revenue, COUNT(*) AS unitsFROM scratch.purchases AS puINNER JOIN scratch.products AS pr ON pu.product_id = pr.product_idGROUP BY pr.categoryORDER BY revenue DESC;
Displays brings in 1500 and Accessories 235. The pattern is always the same: join to bring the columns you need into one row set, then group and aggregate. The amount lives on the purchases table and the category on the products table, and the join is what lets a single query use both.
10. The join fan-out trap
This is the bug that silently inflates totals. If you join to a table that has more than one row per key, every matched row is duplicated, and any SUM or COUNT over it is multiplied. Watch what happens when customers carry several tags.
-- WRONG: Ava has two tags, so her spend is counted twiceWITH tags AS ( SELECT * FROM UNNEST([ STRUCT(1 AS customer_id, 'vip' AS tag), STRUCT(1, 'newsletter'), -- Ava has TWO tags STRUCT(3, 'vip') ]))SELECT p.customer_id, SUM(p.amount) AS inflated_totalFROM scratch.purchases AS pJOIN tags AS t ON t.customer_id = p.customer_idGROUP BY p.customer_id;
Ava’s total comes out as 860 rather than 430, because the join doubled her purchase rows to pair with each tag. The fix is to aggregate before you join, so the totals are already final when the one-to-many table is attached.
-- RIGHT: total spend first, then attach tagsWITH tags AS ( SELECT * FROM UNNEST([ STRUCT(1 AS customer_id, 'vip' AS tag), STRUCT(1, 'newsletter'), STRUCT(3, 'vip') ]),spend AS ( SELECT customer_id, SUM(amount) AS total FROM scratch.purchases GROUP BY customer_id)SELECT s.customer_id, s.total, t.tagFROM spend AS sJOIN tags AS t ON t.customer_id = s.customer_idORDER BY s.customer_id;
Now Ava’s total stays 430 on every tag row, because it was computed before the fan-out happened. When you cannot restructure the query, COUNT(DISTINCT ...) or SUM over a pre-aggregated subquery is the defence. Fan-out is silent, so whenever a total looks too high after a join, suspect a non-unique key first.
Work through these and you will have covered the whole article: COUNT in all three forms, SUM, AVG, MIN and MAX with GROUP BY and its rule, the INNER, LEFT, RIGHT and FULL joins, joining then aggregating, and the fan-out trap with its fix. The mental model to keep is simple: joins decide which rows you have, aggregates decide how they collapse, and the order you do them in is what keeps your numbers honest.
See you soon.
[…] Aggregates and Joins in BigQuery: 10 Code-Along Examples […]
[…] Aggregates and Joins in BigQuery: 10 Code-Along Examples […]