Most people learn GROUP BY early, and it works well right up until it doesn’t. The moment you want to keep your individual rows while also adding a calculation across a group, like a running total, a rank, or a comparison to the previous row, GROUP BY runs out of road. It collapses your rows into summaries. Window functions solve exactly this problem. They let you perform calculations across a set of rows while keeping every row intact.
What a Window Function Does
A window function performs a calculation across a set of table rows and returns a single value for each row in that set. The crucial difference from GROUP BY is that it does not collapse your rows. You keep the full detail of your data and gain an extra column with the calculated value attached to each row. This makes window functions ideal for aggregations, rankings, and comparisons that would otherwise force you to lose the row-level detail.
The Basic Syntax
Every window function follows the same pattern: a function, followed by an OVER clause that defines the “window” of rows to operate on.
function_name(expression) OVER ( PARTITION BY column_name ORDER BY column_name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
There are three pieces inside the OVER clause. PARTITION BY divides the data into subsets, or windows, so the calculation restarts for each group. ORDER BY defines the order of rows inside each window, which matters for things like running totals and rankings. And the ROWS or RANGE clause specifies precisely which rows to include in the window frame, though the default behaviour is often all you need. All three are optional, and you will frequently use only the ones a given calculation requires.
The Three Families of Window Functions
Window functions fall into three broad categories. Aggregations like SUM(), AVG(), COUNT(), MAX(), and MIN() perform the familiar calculations, but per row rather than collapsing the data. Ranking functions like ROW_NUMBER(), RANK(), and DENSE_RANK() assign positions to rows. And value accessors like LAG(), LEAD(), FIRST_VALUE(), and LAST_VALUE() let one row reach out and read values from other rows in the window.
Running Totals with Aggregations
The classic example is a running total. Here we calculate cumulative sales over time:
SELECT date, sales, SUM(sales) OVER (ORDER BY date) AS running_totalFROM `dataset.daily_sales`
Because there is no PARTITION BY, the whole table is one window, and the ORDER BY date tells the function to accumulate sales in date order. Each row shows the cumulative total of all sales up to and including that date, while still showing that day’s individual sales figure. A plain GROUP BY could give you the grand total, but never this row-by-row accumulation.
Restarting the Calculation per Group
Add PARTITION BY and the running total restarts for each group. Here we compute a separate running total for every category:
SELECT category, date, sales, SUM(sales) OVER ( PARTITION BY category ORDER BY date ) AS running_total_by_categoryFROM `dataset.daily_sales`
Now the cumulative total resets each time the category changes, so you get an independent running total per category, all in the same query and without losing any rows. PARTITION BY is the window function equivalent of grouping, except it slices the data into windows rather than collapsing it.
Ranking Rows
Ranking functions assign a position to each row based on an ordering you define. Here we rank customers by how much they have spent:
SELECT customer_id, total_spent, RANK() OVER (ORDER BY total_spent DESC) AS revenue_rankFROM ( SELECT customer_id, SUM(amount) AS total_spent FROM `dataset.orders` GROUP BY customer_id)
The inner query aggregates total spend per customer, then RANK() assigns each customer a rank from highest to lowest. With RANK(), customers with identical totals receive the same rank, and the next rank skips accordingly. This is the kind of leaderboard logic that GROUP BY simply cannot produce on its own.
Top N per Group with ROW_NUMBER()
ROW_NUMBER() is the go-to function for finding the top few records within each group. Unlike RANK(), it always assigns a unique sequential number, even when values tie, which makes it perfect for “top N per group” problems. Here we get each customer’s three largest orders:
SELECT *FROM ( SELECT customer_id, order_id, amount, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY amount DESC ) AS rank_by_amount FROM `dataset.orders`)WHERE rank_by_amount <= 3
The window numbers each customer’s orders from largest to smallest, restarting at one for each customer. The outer query then keeps only the rows numbered three or below. This pattern, numbering inside a subquery and filtering outside it, is one of the most useful techniques in all of SQL.
Comparing Rows with LAG and LEAD
Value accessors let a row read data from other rows. LAG() reaches backwards to a previous row, and LEAD() reaches forwards. This makes period-over-period comparisons easy:
SELECT date, sales, LAG(sales) OVER (ORDER BY date) AS yesterday_sales, sales - LAG(sales) OVER (ORDER BY date) AS daily_changeFROM `dataset.daily_sales`
LAG(sales) pulls in the previous day’s sales figure onto the current row, and subtracting it from today’s sales gives the day-on-day change. Achieving this without window functions would require an awkward self-join. Here it is a single, readable line.
First and Last Values in a Window
FIRST_VALUE() and LAST_VALUE() return the first or last value within a window, which is useful for things like a customer’s first and most recent purchase:
SELECT customer_id, date, sales, FIRST_VALUE(sales) OVER ( PARTITION BY customer_id ORDER BY date ) AS first_sale, LAST_VALUE(sales) OVER ( PARTITION BY customer_id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_saleFROM `dataset.customer_sales`
There is an important subtlety here. FIRST_VALUE() works as expected with the default frame, but LAST_VALUE() does not. By default, the window frame only extends to the current row, so LAST_VALUE() would just return the current row’s value rather than the true last value in the partition. That is why the explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is added: it widens the frame to cover every row in the window, so LAST_VALUE() genuinely returns the final value. This is one of the most common window function gotchas, so it is worth remembering.
Window Functions Versus GROUP BY
It helps to be clear about when each tool applies. GROUP BY aggregates by collapsing rows into one summary row per group, which is exactly what you want when you only need the totals. Window functions aggregate per row while keeping the full row count, which is what you want when you need the detail and the calculation side by side.
Beyond aggregation, window functions do things GROUP BY cannot do directly at all: ranking and ordering within groups, and comparative analytics like comparing a row to the one before it. And critically, window functions retain the underlying detail of your data, where GROUP BY discards it in favour of the summary. The short version: reach for GROUP BY when you want summaries, and window functions when you want per-row calculations that preserve the original rows.
Keeping Window Functions Efficient
Because window functions operate over sets of rows, a few habits keep them performant in BigQuery. Partition thoughtfully, since well-chosen partitions limit how much data each window has to process. Filter early to reduce the volume of rows the functions work across. Avoid unnecessarily large windows by using partitions to keep each window focused. And select only the columns you actually need, since scanning fewer columns directly lowers your query cost.
Common Pitfalls
A few mistakes come up repeatedly. Forgetting PARTITION BY when you meant to calculate per group means the function runs across the entire table instead, so add the partition to scope it correctly. An incorrect ORDER BY produces the wrong running totals or rankings, so check that the ordering logic matches your intent. And unintended ROWS or RANGEbehaviour, like the LAST_VALUE() frame issue above, can give surprising results, so either rely on the default deliberately or define the frame explicitly when it matters.
The Takeaway
Window functions let you calculate across a set of rows without collapsing them, filling the gap that GROUP BY leaves behind. Use the OVER clause to define the window, PARTITION BY to split it into groups, and ORDER BY to sequence rows for running totals and rankings. Aggregations give you running calculations, ranking functions power leaderboards and top-N queries, and value accessors like LAG() and LEAD() make period-over-period comparisons trivial. Keep an eye on the LAST_VALUE() frame quirk, partition thoughtfully for performance, and you have one of the most powerful tools in BigQuery for analytical SQL.
See you soon.
[…] manual refresh, since the whole process is managed for you. Importantly, refreshes are incremental: BigQuery processes only the rows that have changed since the last refresh rather than recomputing […]
[…] CTE exists only inside the single query that defines it. It is lightweight, optimised in place by BigQuery, and ideal when you need a result for just one query. A temporary table, by contrast, persists for […]
[…] how you turn BigQuery’s nested, repeated data into the flat rows that standard SQL expects. Flatten a single array and each […]