BigQuery Scalar Subqueries: 10 Code-Along Examples

Learn BigQuery scalar subqueries by running them. Ten copy-and-run examples with sample tables covering SELECT, WHERE, HAVING and expressions, the one-row rule and its error, correlated subqueries, and JOIN rewrites.

The scalar subqueries article explains the one rule that defines them: a scalar subquery returns exactly one value, one row and one column, so you can drop it anywhere SQL expects a single value. This workbook puts that to work. You will use scalar subqueries in SELECTWHEREHAVING and arithmetic, trigger and fix the classic “more than one element” error, write a correlated version that runs per row, and finish by seeing when a join is the better choice. Run Example 1 first; the rest depend on it.

1. Create the sample tables

Run this once. It builds three small tables so every later query has something to read. The numbers are tiny so you can check each result by eye.

-- Run once. Change `scratch` to any dataset you can write to.
CREATE OR REPLACE TABLE scratch.customers AS
SELECT * FROM UNNEST([
STRUCT(1 AS customer_id, 'Ava' AS name, 'UK' AS country),
STRUCT(2, 'Ben', 'UK'),
STRUCT(3, 'Chloe', 'US'),
STRUCT(4, 'Dan', 'US') -- never makes a purchase
]);
CREATE OR REPLACE TABLE scratch.products AS
SELECT * FROM UNNEST([
STRUCT(101 AS product_id, 'Keyboard' AS name, 'Accessories' AS category, 80 AS price),
STRUCT(102, 'Monitor', 'Displays', 300),
STRUCT(103, 'Mouse', 'Accessories', 25),
STRUCT(104, 'Webcam', 'Accessories', 60)
]);
CREATE OR REPLACE TABLE scratch.purchases AS
SELECT * FROM UNNEST([
STRUCT(1 AS purchase_id, 1 AS customer_id, 102 AS product_id, 1 AS quantity, 300 AS amount),
STRUCT(2, 1, 101, 1, 80),
STRUCT(3, 1, 103, 2, 50),
STRUCT(4, 2, 103, 1, 25),
STRUCT(5, 3, 102, 4, 1200),
STRUCT(6, 3, 101, 1, 80)
]);
SELECT * FROM scratch.products ORDER BY product_id;

For reference as you work through the rest: the average product price is 116.25, the Displays category averages 300, and the Accessories category averages about 55. Keep those numbers in mind to sanity-check each query.

2. A scalar subquery in SELECT

The simplest use is to add a single global figure as a column on every row. This subquery returns one value, the overall average price, which appears beside each product so you can compare it directly.

SELECT
name,
price,
(SELECT AVG(price) FROM scratch.products) AS avg_price, -- one value, repeated per row
price - (SELECT AVG(price) FROM scratch.products) AS diff_from_avg
FROM scratch.products
ORDER BY price DESC;

Every row shows the same avg_price of 116.25 and its own distance from it. The subquery runs once for the whole query because it does not depend on the current row, which is what makes it efficient as a global constant.

3. A scalar subquery in WHERE

Because a scalar subquery is just a single value, you can compare against it in a filter. Here it supplies the threshold: keep only products priced above the average.

SELECT name, price
FROM scratch.products
WHERE price > (SELECT AVG(price) FROM scratch.products);

Only the Monitor comes back, since it is the one product above 116.25. This pattern, comparing each row to a computed value, is one of the most common reasons to reach for a scalar subquery.

4. A scalar subquery inside an expression

A scalar subquery can sit anywhere a value can, including in the middle of arithmetic. This one builds a price index, expressing each price as a percentage of the average.

SELECT
name,
ROUND(price / (SELECT AVG(price) FROM scratch.products) * 100, 1) AS price_index
FROM scratch.products
ORDER BY price_index DESC;

The Monitor lands at about 258, meaning it costs roughly two and a half times the average, while the Mouse sits well below 100. Normalising values against a dataset-wide figure like this is a tidy job for a scalar subquery in an expression.

5. Returning one value with an aggregate

Aggregate functions are the natural partner for scalar subqueries because they collapse many rows into one value. Here MAX finds the single highest price, and the outer query returns whichever product matches it.

SELECT name, price
FROM scratch.products
WHERE price = (SELECT MAX(price) FROM scratch.products);

This returns the Monitor at 300. MAXMINAVGSUM and COUNT all guarantee a single value, which is exactly what the scalar rule demands, so they slot in without any extra effort.

6. The “more than one element” error, and the fix

The defining rule bites when a subquery returns more than one row. This query asks to compare against every Accessories price at once, which is not a single value, so BigQuery raises “Scalar subquery produced more than one element.”

-- ERROR: Scalar subquery produced more than one element
SELECT name
FROM scratch.products
WHERE price = (SELECT price FROM scratch.products WHERE category = 'Accessories');

The fix is to collapse the inner query to one value, usually with an aggregate.

-- Fix: an aggregate guarantees a single value
SELECT name, price
FROM scratch.products
WHERE price = (SELECT MAX(price) FROM scratch.products WHERE category = 'Accessories');

The corrected version returns the Keyboard, the priciest Accessory at 80. Whenever you hit this error, the question to ask is “how do I reduce this to one row and one column,” and an aggregate is almost always the answer.

7. A scalar subquery in HAVING

HAVING filters groups after aggregation, and a scalar subquery gives it an external figure to compare against. This finds categories whose average price beats the overall average.

SELECT category, AVG(price) AS avg_cat_price
FROM scratch.products
GROUP BY category
HAVING AVG(price) > (SELECT AVG(price) FROM scratch.products);

Only Displays qualifies, at 300 against the overall 116.25. The grouped aggregate on the left is compared to the single global value the subquery returns on the right, which is how you contrast each group with the dataset as a whole.

8. A correlated scalar subquery

So far the subqueries have run once. A correlated scalar subquery references the outer row, so it runs once per row and produces a per-row value. Here it counts each customer’s purchases as an inline figure.

SELECT
c.name,
(SELECT COUNT(*)
FROM scratch.purchases AS p
WHERE p.customer_id = c.customer_id) AS purchase_count -- depends on the outer customer
FROM scratch.customers AS c
ORDER BY c.customer_id;

You get Ava 3, Ben 1, Chloe 2, Dan 0. The link p.customer_id = c.customer_id is what makes it correlated: the inner query cannot run on its own because it needs a value from the outer row.

9. A correlated scalar subquery in an expression

Combine the last two ideas to compute a per-row percentage. This shows each purchase as a share of that customer’s own total spend, with the total supplied by a correlated scalar subquery.

SELECT
p.purchase_id,
p.customer_id,
p.amount,
ROUND(p.amount / (
SELECT SUM(p2.amount)
FROM scratch.purchases AS p2
WHERE p2.customer_id = p.customer_id
) * 100, 1) AS pct_of_customer_total
FROM scratch.purchases AS p
ORDER BY p.customer_id, p.amount DESC;

Chloe’s Monitor shows about 93.8 percent of her spend and her Keyboard about 6.3. The same table appears twice under pand p2 so the inner sum is scoped to the outer row’s customer.

10. Scalar subquery versus JOIN

Correlated scalar subqueries are clear, but they run once per outer row, which scales poorly on large tables. When you need the same per-row aggregate across many rows, a join with GROUP BY usually computes it in one pass. Here is the per-customer count from Example 8, rewritten.

-- Same answer as Example 8, computed in a single pass
SELECT
c.customer_id,
c.name,
COUNT(pu.purchase_id) AS purchase_count
FROM scratch.customers AS c
LEFT JOIN scratch.purchases AS pu
ON pu.customer_id = c.customer_id
GROUP BY c.customer_id, c.name;

The LEFT JOIN keeps Dan with a count of zero, matching the correlated version exactly. The rule of thumb from the guide: a scalar subquery is ideal for a single global constant, as in Examples 2 to 4, while per-row aggregates over large tables usually belong in a join with GROUP BY.

Work through these and you will have covered the whole article: the one-row-one-column rule, scalar subqueries in SELECTWHEREHAVING and expressions, the “more than one element” error and its fix, the correlated form that runs per row, and the join rewrite for scale. One last practical note the guide stresses: BigQuery charges by bytes scanned, so filter your inner subqueries tightly to keep both cost and runtime down.

See you soon.

View Comments (1)

Leave a Reply

Prev Next

Subscribe to My Newsletter

Subscribe to my email newsletter to get the latest posts delivered right to your email. Pure inspiration, zero spam.

Discover more from Discuss Data Science, Machine Learning and Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading