Writing Cleaner BigQuery SQL with Common Table Expressions

As SQL queries grow, they tend to get harder to read. Subqueries nest inside subqueries, the same logic gets repeated in different places, and a query that made perfect sense when you wrote it becomes a puzzle a month later. Common Table Expressions, or CTEs, are BigQuery’s answer to this. They let you break a complex query into named, logical steps that read from top to bottom like a story, rather than from the inside out like a riddle.

What a CTE Actually Is

A CTE is a temporary, named result set that exists only for the duration of a single query. You define it using the WITHclause, give it a name, and then reference that name elsewhere in the query as if it were a real table. Once the query finishes running, the CTE disappears. It is not saved anywhere, and it can be referenced as many times as you need within that one query.

The basic shape looks like this:

WITH cte_name AS (
SELECT column1, column2
FROM `project.dataset.table`
WHERE condition
)
SELECT *
FROM cte_name

The WITH block defines a named result set, and the final SELECT reads from it. You can think of a CTE as giving a name to a piece of a query so you can refer back to it cleanly.

Why Bother

The value of CTEs comes down to readability and structure. They let you break a complex query into logical parts, each with a clear purpose, so anyone reading it can follow the flow. They are reusable within the query, meaning a piece of logic you would otherwise repeat can be defined once and referenced multiple times. They make debugging easier, because you can run each CTE on its own to check it produces what you expect before building on top of it. And they are a far cleaner alternative to deeply nested subqueries, which become almost unreadable past a couple of levels.

CTE Versus Subquery

The clearest way to see the benefit is to compare the same logic written both ways. Here it is as a nested subquery:

SELECT user_id, AVG(order_total)
FROM (
SELECT user_id, order_total
FROM `project.dataset.orders`
WHERE status = 'completed'
)
GROUP BY user_id

To understand this, you have to read from the inside out: first work out what the inner query does, then apply the outer one. Now the same thing as a CTE:

WITH completed_orders AS (
SELECT user_id, order_total
FROM `project.dataset.orders`
WHERE status = 'completed'
)
SELECT user_id, AVG(order_total)
FROM completed_orders
GROUP BY user_id

The result is identical, but the logic now reads in the order it happens: first gather the completed orders, then average them per user. The named step, completed_orders, also tells you what that piece of data represents, which the anonymous subquery never did.

Chaining Multiple CTEs

The real power appears when you string several CTEs together. You define them one after another, separated by commas, and each one can reference the CTEs defined before it:

WITH first_cte AS (
SELECT * FROM `table1`
),
second_cte AS (
SELECT * FROM first_cte WHERE condition = TRUE
)
SELECT * FROM second_cte

Notice that second_cte reads from first_cte. This is what lets you build a query as a sequence of steps, each transforming the result of the last.

A Real-World Example

Suppose you want the names of your top customers, defined as anyone who has spent more than 1,000. That involves three distinct operations: calculate how much each customer has spent, filter down to the big spenders, then look up their names. CTEs let you write each step in order:

WITH customer_spend AS (
SELECT user_id, SUM(order_total) AS total_spent
FROM `project.dataset.orders`
GROUP BY user_id
),
top_customers AS (
SELECT user_id
FROM customer_spend
WHERE total_spent > 1000
)
SELECT u.user_id, u.name
FROM `project.dataset.users` u
JOIN top_customers t
ON u.user_id = t.user_id

Read it top to bottom and the logic is obvious: calculate spend per customer, filter to those over 1,000, then join to the users table to get their names. The same query written as nested subqueries would be considerably harder to follow. Breaking it into calculate, then filter, then join is exactly the kind of step-by-step structure CTEs are built for.

How CTEs Compare to Temporary Tables and Views

CTEs are not the only way to store an intermediate result, and it helps to know when each option fits.

A CTE exists only inside the single query that defines it. It is lightweight, optimised in place by BigQuery, and ideal when you need a result for just one query. A temporary table, by contrast, persists for the length of a session or script, so it can be referenced across multiple separate queries, and it is materialised in memory rather than recomputed. A view is saved permanently in your dataset and can be reused across many queries by anyone with access, though it is virtual, meaning it stores the query definition rather than the data itself.

The rule of thumb: use a CTE when the intermediate result is only needed within one query, a temporary table when you need to reuse it across several steps in a script, and a view when the logic is valuable enough to save and share across many queries over time.

Getting the Most From CTEs

A few habits make CTEs more effective. Name them descriptively, so completed_orders or customer_spend rather than cte1 or temp. A good name documents what the result represents and saves the reader from working it out. Keep each CTE focused on one responsibility, the same way a well-written function does one thing, because a focused CTE is far easier to test and debug. Chain multiple CTEs to express a flow of logic, treating each as one step in a transformation. And avoid SELECT * where you can, listing the specific columns you need instead, since in BigQuery scanning unnecessary columns costs you both query time and money.

Where CTEs Shine

In practice, CTEs come into their own in a handful of recurring situations. They are ideal for top-N problems, where you calculate a total per item, then filter or rank to return the top few. They are excellent for pre-aggregating data before a dashboard query, preparing a clean, summarised set that the final visualisation reads from. They suit chained transformations, where you clean the data, enrich it, then filter it as a sequence of clear steps. And above all, they are the cure for subquery spaghetti, replacing long, deeply nested SELECTs with a readable series of named stages.

The Takeaway

A CTE is a named, temporary result set defined with WITH that lives only for the length of your query. It turns nested, read-from-the-inside-out SQL into a clean sequence of logical steps that reads top to bottom. Chain several together to build complex logic in stages, name them clearly, keep each one focused, and reach for a temporary table or view instead when you need the result to outlive a single query. For making complicated BigQuery SQL readable and maintainable, CTEs are one of the simplest and most effective tools you have.

See you soon.

View Comments (2)

Leave a Reply

Prev Next

Subscribe to My Newsletter

Subscribe to my email newsletter to get the latest posts delivered right to your email. Pure inspiration, zero spam.

Discover more from Discuss Data Science, Machine Learning and Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading