Aggregates and Joins in BigQuery: 10 Code-Along Examples

Learn BigQuery aggregates and joins by running them. Ten copy-and-run examples with sample tables covering COUNT, SUM, AVG, GROUP BY, every join type, joining then aggregating, and the join fan-out trap.

The aggregates and joins article covers the two workhorses of analytical SQL: collapsing many rows into summary numbers with COUNT, SUM and AVG, and stitching tables together with the JOIN family. This workbook walks both, then puts them together and shows the one trap that quietly corrupts totals. Run Example 1 first; the rest depend on it.

I recommend you write the code yourself, muscle memory is real.

1. Create the sample tables

Run this once. Three small tables, with deliberately tiny numbers so you can verify every result by eye. Note that Dan never buys anything and the Webcam is never sold, which makes the join examples meaningful.

-- Run once. Change `scratch` to any dataset you can write to.
CREATE OR REPLACE TABLE scratch.customers AS
SELECT * FROM UNNEST([
STRUCT(1 AS customer_id, 'Ava' AS name, 'UK' AS country),
STRUCT(2, 'Ben', 'UK'),
STRUCT(3, 'Chloe', 'US'),
STRUCT(4, 'Dan', 'US') -- never makes a purchase
]);
CREATE OR REPLACE TABLE scratch.products AS
SELECT * FROM UNNEST([
STRUCT(101 AS product_id, 'Keyboard' AS name, 'Accessories' AS category, 80 AS price),
STRUCT(102, 'Monitor', 'Displays', 300),
STRUCT(103, 'Mouse', 'Accessories', 25),
STRUCT(104, 'Webcam', 'Accessories', 60) -- never purchased
]);
CREATE OR REPLACE TABLE scratch.purchases AS
SELECT * FROM UNNEST([
STRUCT(1 AS purchase_id, 1 AS customer_id, 102 AS product_id, 1 AS quantity, 300 AS amount),
STRUCT(2, 1, 101, 1, 80),
STRUCT(3, 1, 103, 2, 50),
STRUCT(4, 2, 103, 1, 25),
STRUCT(5, 3, 102, 4, 1200),
STRUCT(6, 3, 101, 1, 80)
]);
SELECT * FROM scratch.purchases ORDER BY purchase_id;

For reference: there are six purchases across three buyers, total spend is Ava 430, Ben 25, Chloe 1280, Dan nothing.

2. COUNT, in its three forms

COUNT answers “how many,” but the form matters. COUNT(*) counts every row, COUNT(column) counts only rows where that column is not null, and COUNT(DISTINCT column) counts unique values.

SELECT
COUNT(*) AS total_purchases, -- every row
COUNT(DISTINCT customer_id) AS buyers, -- unique customers who bought
COUNT(quantity) AS rows_with_quantity -- non-null quantities only
FROM scratch.purchases;

You get 6 purchases from 3 distinct buyers. The distinction between COUNT(*) and COUNT(column) becomes important the moment a column contains nulls, which is exactly what the LEFT JOIN in Example 7 will produce.

3. GROUP BY with COUNT

On its own, an aggregate collapses the whole table to one number. GROUP BY splits the table into groups first, so you get one result per group. Here it counts purchases per customer.

SELECT
customer_id,
COUNT(*) AS purchase_count
FROM scratch.purchases
GROUP BY customer_id
ORDER BY customer_id;

The result is customer 1 with 3, customer 2 with 1, and customer 3 with 2. Read GROUP BY customer_id as “make one bucket per customer,” and the COUNT then applies inside each bucket rather than across the whole table.

4. SUM, and the GROUP BY rule

SUM totals a numeric column, skipping nulls. Grouped, it gives a total per group, the classic “spend per customer” query.

SELECT
customer_id,
SUM(amount) AS total_spend
FROM scratch.purchases
GROUP BY customer_id
ORDER BY total_spend DESC;

Chloe leads at 1280. There is one rule to internalise here: every column in the SELECT that is not wrapped in an aggregate must appear in the GROUP BY. customer_id is not aggregated, so it has to be grouped. Break that rule and BigQuery rejects the query, which is its way of stopping an ambiguous result.

5. AVG, MIN and MAX together

The remaining aggregates follow the same pattern and can be combined in one query. Here we summarise prices per product category.

SELECT
category,
AVG(price) AS avg_price,
MIN(price) AS cheapest,
MAX(price) AS dearest,
COUNT(*) AS products
FROM scratch.products
GROUP BY category
ORDER BY avg_price DESC;

Displays averages 300 from a single product, while Accessories averages 55 across three. Like SUM, all of these skip nulls automatically, so a missing price would be ignored rather than treated as zero.

6. INNER JOIN

Joins combine rows from two tables that share a key. An INNER JOIN keeps only rows that match on both sides. Here it attaches each purchase to its product so you can see names instead of ids.

SELECT
pu.purchase_id,
pr.name AS product,
pr.category,
pu.amount
FROM scratch.purchases AS pu
INNER JOIN scratch.products AS pr
ON pu.product_id = pr.product_id -- the join condition
ORDER BY pu.purchase_id;

All six purchases match a product, so you get six rows. The ON clause states which columns must line up, and the short aliases pu and pr keep the query readable. The Webcam never appears, because no purchase references it and an inner join drops non-matching rows.

7. LEFT JOIN, with IFNULL

LEFT JOIN keeps every row from the left table even when there is no match on the right, filling the gaps with nulls. This is how you include customers who have never bought anything.

SELECT
c.name,
COUNT(p.purchase_id) AS purchase_count, -- COUNT ignores the nulls
IFNULL(SUM(p.amount), 0) AS total_spend -- turn a null total into 0
FROM scratch.customers AS c
LEFT JOIN scratch.purchases AS p
ON p.customer_id = c.customer_id
GROUP BY c.name
ORDER BY total_spend DESC;

Dan appears with a count of 0 and a spend of 0. Two things are working together: COUNT(p.purchase_id) returns 0 for Dan because COUNT ignores the null from the unmatched join, and IFNULL converts his null SUM into a clean 0. An INNER JOIN here would have dropped Dan entirely.

8. RIGHT JOIN and FULL JOIN

RIGHT JOIN keeps every row from the right table, and a FULL JOIN keeps every row from both. Here is a FULL JOIN that surfaces products that have never sold.

SELECT
pr.name AS product,
COUNT(pu.purchase_id) AS times_sold
FROM scratch.products AS pr
FULL JOIN scratch.purchases AS pu
ON pu.product_id = pr.product_id
GROUP BY pr.name
ORDER BY times_sold;

The Webcam shows up with 0, since FULL JOIN retains it despite having no purchases. A RIGHT JOIN is just a LEFT JOIN with the tables swapped, so most teams standardise on LEFT for readability and rarely write RIGHT at all. Reach for FULL only when you genuinely need unmatched rows from both sides.

9. Join, then aggregate

The real power comes from joining related tables and then summarising the combined result. Here we join purchases to products and total revenue per category.

SELECT
pr.category,
SUM(pu.amount) AS revenue,
COUNT(*) AS units
FROM scratch.purchases AS pu
INNER JOIN scratch.products AS pr
ON pu.product_id = pr.product_id
GROUP BY pr.category
ORDER BY revenue DESC;

Displays brings in 1500 and Accessories 235. The pattern is always the same: join to bring the columns you need into one row set, then group and aggregate. The amount lives on the purchases table and the category on the products table, and the join is what lets a single query use both.

10. The join fan-out trap

This is the bug that silently inflates totals. If you join to a table that has more than one row per key, every matched row is duplicated, and any SUM or COUNT over it is multiplied. Watch what happens when customers carry several tags.

-- WRONG: Ava has two tags, so her spend is counted twice
WITH tags AS (
SELECT * FROM UNNEST([
STRUCT(1 AS customer_id, 'vip' AS tag),
STRUCT(1, 'newsletter'), -- Ava has TWO tags
STRUCT(3, 'vip')
])
)
SELECT
p.customer_id,
SUM(p.amount) AS inflated_total
FROM scratch.purchases AS p
JOIN tags AS t ON t.customer_id = p.customer_id
GROUP BY p.customer_id;

Ava’s total comes out as 860 rather than 430, because the join doubled her purchase rows to pair with each tag. The fix is to aggregate before you join, so the totals are already final when the one-to-many table is attached.

-- RIGHT: total spend first, then attach tags
WITH tags AS (
SELECT * FROM UNNEST([
STRUCT(1 AS customer_id, 'vip' AS tag),
STRUCT(1, 'newsletter'),
STRUCT(3, 'vip')
]),
spend AS (
SELECT customer_id, SUM(amount) AS total
FROM scratch.purchases
GROUP BY customer_id
)
SELECT s.customer_id, s.total, t.tag
FROM spend AS s
JOIN tags AS t ON t.customer_id = s.customer_id
ORDER BY s.customer_id;

Now Ava’s total stays 430 on every tag row, because it was computed before the fan-out happened. When you cannot restructure the query, COUNT(DISTINCT ...) or SUM over a pre-aggregated subquery is the defence. Fan-out is silent, so whenever a total looks too high after a join, suspect a non-unique key first.

Work through these and you will have covered the whole article: COUNT in all three forms, SUM, AVG, MIN and MAX with GROUP BY and its rule, the INNER, LEFT, RIGHT and FULL joins, joining then aggregating, and the fan-out trap with its fix. The mental model to keep is simple: joins decide which rows you have, aggregates decide how they collapse, and the order you do them in is what keeps your numbers honest.

See you soon.

View Comments (2)

Leave a Reply

Prev Next

Subscribe to My Newsletter

Subscribe to my email newsletter to get the latest posts delivered right to your email. Pure inspiration, zero spam.

Discover more from Discuss Data Science, Machine Learning and Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading