Build a Session-Based GA4 Traffic Acquisition Report in BigQuery

Build a custom session-based GA4 traffic acquisition report from your BigQuery export, attributing each session to its channel, source, medium, and campaign, with transparent engagement and conversion metrics you control.

GA4’s traffic acquisition report tells you where your sessions come from, but the moment you want it defined your way, on the grain you choose, feeding your own dashboard, the interface runs out of room. The raw export in BigQuery does not. This code-along builds a custom traffic acquisition report at the session level, attributing each session to its channel, source, medium, and campaign, and then computing the engagement and conversion metrics that matter. As always with the export, this will not match any GA4 screen exactly, because the documentation, the interface, and the raw data each define things a little differently. That is the point. Here every metric is calculated transparently, so you know precisely what each number means and can feed it into a spreadsheet, a saved table, or Looker Studio.

What the report contains

The grain is the session, attributed to its traffic. The dimensions are the date, the session default channel group, the session source, the session medium, the combined session source and medium, and the session campaign. The metrics cover the audience and its behaviour: total users, active users, sessions, engaged sessions, average engagement time in seconds, engaged sessions per user, engagement rate, events per session, total event count, a specific event count of your choosing, conversions, the session conversion rate, and purchase revenue. One convention to note up front: unless stated otherwise, the calculated user metrics use total users as the denominator.

The build has four stages. Flatten the events into a clean base, collapse them into sessions while attributing each session’s traffic, classify each session into a channel group, then aggregate up to the dimensions and compute the metrics.

Step one: the base layer

First we read the export once and pull out what we need: the session identifier, the date and timestamp, the engagement flag and time, the event name, the traffic source fields, and any purchase revenue.

WITH base AS (
SELECT
user_pseudo_id,
(SELECT ep.value.int_value FROM UNNEST(event_params) ep WHERE ep.key = 'ga_session_id') AS session_id,
PARSE_DATE('%Y%m%d', event_date) AS event_dt,
event_timestamp,
event_name,
(SELECT ep.value.string_value FROM UNNEST(event_params) ep WHERE ep.key = 'session_engaged') AS session_engaged,
(SELECT ep.value.int_value FROM UNNEST(event_params) ep WHERE ep.key = 'engagement_time_msec') AS engagement_msec,
collected_traffic_source.manual_source AS source,
collected_traffic_source.manual_medium AS medium,
collected_traffic_source.manual_campaign_name AS campaign,
ecommerce.purchase_revenue AS purchase_revenue
FROM `your-project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
)

The session id and engagement fields come out of the parameter array. The traffic source comes from the collected_traffic_source record, which carries the source, medium, and campaign collected with each event, and purchase revenue comes ready-made from the ecommerce record. The table suffix filter sets the reporting window and keeps the scan contained.

Step two: build sessions and attribute the traffic

Now we collapse the events into one row per session. The interesting decision is attribution: a session should be credited to the traffic that started it, so for each session we take the first non-null source, medium, and campaign in event order, and fall back to the standard placeholders when nothing was collected.

, sessions AS (
SELECT
user_pseudo_id,
session_id,
MIN(event_dt) AS session_date,
COALESCE((ARRAY_AGG(source IGNORE NULLS ORDER BY event_timestamp LIMIT 1))[SAFE_OFFSET(0)], '(direct)') AS source,
COALESCE((ARRAY_AGG(medium IGNORE NULLS ORDER BY event_timestamp LIMIT 1))[SAFE_OFFSET(0)], '(none)') AS medium,
COALESCE((ARRAY_AGG(campaign IGNORE NULLS ORDER BY event_timestamp LIMIT 1))[SAFE_OFFSET(0)], '(not set)') AS campaign,
MAX(IF(session_engaged = '1', 1, 0)) AS is_engaged,
SUM(engagement_msec) AS engagement_msec,
COUNT(*) AS event_count,
COUNTIF(event_name IN ('purchase', 'generate_lead')) AS conversion_events,
COUNTIF(event_name = 'page_view') AS specific_events,
SUM(IFNULL(purchase_revenue, 0)) AS purchase_revenue
FROM base
GROUP BY user_pseudo_id, session_id
)

Grouping by the user and session id gives one row per visit. The ARRAY_AGG with IGNORE NULLS, ordered by timestamp and limited to one, is the idiom for “first populated value in the session”, and the COALESCE supplies the familiar direct, none, and not-set defaults when a session has no recorded source. The engagement flag becomes a one if any event in the session was engaged, engagement time and event counts are summed, conversions are counted against your list of key events, and the specific event count tracks whichever single event you care about. Edit the key-event list and the specific event to match your setup.

Step three: classify the channel group

GA4’s default channel grouping is a long set of rules. A practical version classifies each session from its source and medium with a CASE expression. This is a deliberate simplification of the official logic, but it covers the common channels and you can extend it as needed.

, sessions_cg AS (
SELECT
*,
CASE
WHEN source = '(direct)' AND medium IN ('(none)', '(not set)') THEN 'Direct'
WHEN REGEXP_CONTAINS(medium, r'^(cpc|ppc|paid)') AND REGEXP_CONTAINS(source, r'google|bing') THEN 'Paid Search'
WHEN medium = 'organic' THEN 'Organic Search'
WHEN REGEXP_CONTAINS(medium, r'^(cpc|ppc|paid)') AND REGEXP_CONTAINS(source, r'facebook|instagram|tiktok|linkedin|twitter') THEN 'Paid Social'
WHEN medium IN ('social', 'social-network', 'sm') THEN 'Organic Social'
WHEN medium = 'email' THEN 'Email'
WHEN medium = 'affiliate' THEN 'Affiliates'
WHEN medium = 'referral' THEN 'Referral'
ELSE 'Unassigned'
END AS channel_group
FROM sessions
)

The order of the WHEN clauses matters, because the first match wins, so the more specific paid rules come before the broader organic and social ones. Treat this as a starting point and tune the patterns to the sources and mediums your own data actually uses.

Step four: the report

Finally we aggregate the sessions up to the dimensions and compute every metric.

SELECT
session_date AS date,
channel_group AS session_default_channel_group,
source AS session_source,
medium AS session_medium,
CONCAT(source, ' / ', medium) AS session_source_medium,
campaign AS session_campaign,
COUNT(DISTINCT user_pseudo_id) AS total_users,
COUNT(DISTINCT IF(is_engaged = 1, user_pseudo_id, NULL)) AS active_users,
COUNT(*) AS sessions,
COUNTIF(is_engaged = 1) AS engaged_sessions,
ROUND(SUM(engagement_msec) / 1000 / COUNT(DISTINCT user_pseudo_id), 1) AS avg_engagement_seconds,
ROUND(COUNTIF(is_engaged = 1) / COUNT(DISTINCT user_pseudo_id), 2) AS engaged_sessions_per_user,
ROUND(COUNTIF(is_engaged = 1) / COUNT(*), 4) AS engagement_rate,
ROUND(SUM(event_count) / COUNT(*), 2) AS events_per_session,
SUM(event_count) AS total_event_count,
SUM(specific_events) AS specific_event_count,
SUM(conversion_events) AS conversions,
ROUND(COUNTIF(conversion_events > 0) / COUNT(*), 4) AS session_conversion_rate,
SUM(purchase_revenue) AS purchase_revenue
FROM sessions_cg
GROUP BY date, session_default_channel_group, session_source, session_medium, session_source_medium, session_campaign
ORDER BY sessions DESC

Because the sessions table holds one row per visit, counting rows gives sessions directly, which is the denominator for the session-based rates. Total users are the distinct visitors in each group, and active users are the distinct visitors who had an engaged session, which is the practical definition of an active user. Average engagement time divides the summed seconds by total users, following the stated convention, and engaged sessions per user uses the same denominator, while engagement rate is per session. Events per session and total event count come from the summed per-session event counts, the specific event count surfaces your chosen event, conversions sum your key events, and the session conversion rate is the share of sessions in which a key event occurred. Purchase revenue sums what the ecommerce field reported.

Notes worth knowing

A few details will keep this honest. The collected_traffic_source record is present on recent exports, but older datasets may not have it, in which case you fall back to the user-scoped traffic_source or to source and medium pulled from the event parameters, so check what your dataset exposes. The channel grouping CASE is a simplification of GA4’s official rules and should be tuned to your own sources and mediums rather than trusted as a perfect replica. The average engagement time here is per user because of the stated convention, so if you want the per-session figure that the interface often shows, divide by sessions instead. Your key-event list and the specific event are placeholders, so set them to your real conversions. The export timestamps are UTC, which will produce small day-boundary differences from a locally timed interface. And as always, expect the totals not to match the GA4 screen, which is the sign that you now own the definitions.

The reward is a traffic acquisition report you can defend and extend. Every session is attributed and classified in SQL you can read, every metric shares a clear denominator, and the same query serves a one-off export, a scheduled table, or a live dashboard. From here you can add a device or landing-page breakdown, switch the attribution from first-touch to last-non-direct, or join in cost data to turn acquisition into true return on ad spend.

See you soon.

View Comments (2)

Leave a Reply

Prev Next

Subscribe to My Newsletter

Subscribe to my email newsletter to get the latest posts delivered right to your email. Pure inspiration, zero spam.

Discover more from Discuss Data Science, Machine Learning and Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading