The scalar subqueries article explains the one rule that defines them: a scalar subquery returns exactly one value, one row and one column, so you can drop it anywhere SQL expects a single value. This workbook puts that to work. You will use scalar subqueries in SELECT, WHERE, HAVING and arithmetic, trigger and fix the classic “more than one element” error, write a correlated version that runs per row, and finish by seeing when a join is the better choice. Run Example 1 first; the rest depend on it.
1. Create the sample tables
Run this once. It builds three small tables so every later query has something to read. The numbers are tiny so you can check each result by eye.
-- Run once. Change `scratch` to any dataset you can write to.CREATE OR REPLACE TABLE scratch.customers ASSELECT * FROM UNNEST([ STRUCT(1 AS customer_id, 'Ava' AS name, 'UK' AS country), STRUCT(2, 'Ben', 'UK'), STRUCT(3, 'Chloe', 'US'), STRUCT(4, 'Dan', 'US') -- never makes a purchase]);CREATE OR REPLACE TABLE scratch.products ASSELECT * FROM UNNEST([ STRUCT(101 AS product_id, 'Keyboard' AS name, 'Accessories' AS category, 80 AS price), STRUCT(102, 'Monitor', 'Displays', 300), STRUCT(103, 'Mouse', 'Accessories', 25), STRUCT(104, 'Webcam', 'Accessories', 60)]);CREATE OR REPLACE TABLE scratch.purchases ASSELECT * FROM UNNEST([ STRUCT(1 AS purchase_id, 1 AS customer_id, 102 AS product_id, 1 AS quantity, 300 AS amount), STRUCT(2, 1, 101, 1, 80), STRUCT(3, 1, 103, 2, 50), STRUCT(4, 2, 103, 1, 25), STRUCT(5, 3, 102, 4, 1200), STRUCT(6, 3, 101, 1, 80)]);SELECT * FROM scratch.products ORDER BY product_id;
For reference as you work through the rest: the average product price is 116.25, the Displays category averages 300, and the Accessories category averages about 55. Keep those numbers in mind to sanity-check each query.
2. A scalar subquery in SELECT
The simplest use is to add a single global figure as a column on every row. This subquery returns one value, the overall average price, which appears beside each product so you can compare it directly.
SELECT name, price, (SELECT AVG(price) FROM scratch.products) AS avg_price, -- one value, repeated per row price - (SELECT AVG(price) FROM scratch.products) AS diff_from_avgFROM scratch.productsORDER BY price DESC;
Every row shows the same avg_price of 116.25 and its own distance from it. The subquery runs once for the whole query because it does not depend on the current row, which is what makes it efficient as a global constant.
3. A scalar subquery in WHERE
Because a scalar subquery is just a single value, you can compare against it in a filter. Here it supplies the threshold: keep only products priced above the average.
SELECT name, priceFROM scratch.productsWHERE price > (SELECT AVG(price) FROM scratch.products);
Only the Monitor comes back, since it is the one product above 116.25. This pattern, comparing each row to a computed value, is one of the most common reasons to reach for a scalar subquery.
4. A scalar subquery inside an expression
A scalar subquery can sit anywhere a value can, including in the middle of arithmetic. This one builds a price index, expressing each price as a percentage of the average.
SELECT name, ROUND(price / (SELECT AVG(price) FROM scratch.products) * 100, 1) AS price_indexFROM scratch.productsORDER BY price_index DESC;
The Monitor lands at about 258, meaning it costs roughly two and a half times the average, while the Mouse sits well below 100. Normalising values against a dataset-wide figure like this is a tidy job for a scalar subquery in an expression.
5. Returning one value with an aggregate
Aggregate functions are the natural partner for scalar subqueries because they collapse many rows into one value. Here MAX finds the single highest price, and the outer query returns whichever product matches it.
SELECT name, priceFROM scratch.productsWHERE price = (SELECT MAX(price) FROM scratch.products);
This returns the Monitor at 300. MAX, MIN, AVG, SUM and COUNT all guarantee a single value, which is exactly what the scalar rule demands, so they slot in without any extra effort.
6. The “more than one element” error, and the fix
The defining rule bites when a subquery returns more than one row. This query asks to compare against every Accessories price at once, which is not a single value, so BigQuery raises “Scalar subquery produced more than one element.”
-- ERROR: Scalar subquery produced more than one elementSELECT nameFROM scratch.productsWHERE price = (SELECT price FROM scratch.products WHERE category = 'Accessories');
The fix is to collapse the inner query to one value, usually with an aggregate.
-- Fix: an aggregate guarantees a single valueSELECT name, priceFROM scratch.productsWHERE price = (SELECT MAX(price) FROM scratch.products WHERE category = 'Accessories');
The corrected version returns the Keyboard, the priciest Accessory at 80. Whenever you hit this error, the question to ask is “how do I reduce this to one row and one column,” and an aggregate is almost always the answer.
7. A scalar subquery in HAVING
HAVING filters groups after aggregation, and a scalar subquery gives it an external figure to compare against. This finds categories whose average price beats the overall average.
SELECT category, AVG(price) AS avg_cat_priceFROM scratch.productsGROUP BY categoryHAVING AVG(price) > (SELECT AVG(price) FROM scratch.products);
Only Displays qualifies, at 300 against the overall 116.25. The grouped aggregate on the left is compared to the single global value the subquery returns on the right, which is how you contrast each group with the dataset as a whole.
8. A correlated scalar subquery
So far the subqueries have run once. A correlated scalar subquery references the outer row, so it runs once per row and produces a per-row value. Here it counts each customer’s purchases as an inline figure.
SELECT c.name, (SELECT COUNT(*) FROM scratch.purchases AS p WHERE p.customer_id = c.customer_id) AS purchase_count -- depends on the outer customerFROM scratch.customers AS cORDER BY c.customer_id;
You get Ava 3, Ben 1, Chloe 2, Dan 0. The link p.customer_id = c.customer_id is what makes it correlated: the inner query cannot run on its own because it needs a value from the outer row.
9. A correlated scalar subquery in an expression
Combine the last two ideas to compute a per-row percentage. This shows each purchase as a share of that customer’s own total spend, with the total supplied by a correlated scalar subquery.
SELECT p.purchase_id, p.customer_id, p.amount, ROUND(p.amount / ( SELECT SUM(p2.amount) FROM scratch.purchases AS p2 WHERE p2.customer_id = p.customer_id ) * 100, 1) AS pct_of_customer_totalFROM scratch.purchases AS pORDER BY p.customer_id, p.amount DESC;
Chloe’s Monitor shows about 93.8 percent of her spend and her Keyboard about 6.3. The same table appears twice under pand p2 so the inner sum is scoped to the outer row’s customer.
10. Scalar subquery versus JOIN
Correlated scalar subqueries are clear, but they run once per outer row, which scales poorly on large tables. When you need the same per-row aggregate across many rows, a join with GROUP BY usually computes it in one pass. Here is the per-customer count from Example 8, rewritten.
-- Same answer as Example 8, computed in a single passSELECT c.customer_id, c.name, COUNT(pu.purchase_id) AS purchase_countFROM scratch.customers AS cLEFT JOIN scratch.purchases AS pu ON pu.customer_id = c.customer_idGROUP BY c.customer_id, c.name;
The LEFT JOIN keeps Dan with a count of zero, matching the correlated version exactly. The rule of thumb from the guide: a scalar subquery is ideal for a single global constant, as in Examples 2 to 4, while per-row aggregates over large tables usually belong in a join with GROUP BY.
Work through these and you will have covered the whole article: the one-row-one-column rule, scalar subqueries in SELECT, WHERE, HAVING and expressions, the “more than one element” error and its fix, the correlated form that runs per row, and the join rewrite for scale. One last practical note the guide stresses: BigQuery charges by bytes scanned, so filter your inner subqueries tightly to keep both cost and runtime down.
See you soon.
[…] BigQuery Scalar Subqueries: 10 Code-Along Examples […]