Google Analytics 4 can be directly connected to Google BigQuery, allowing us to work with the underlying data in our ways.
Once a connection is configured you can query a database of Google Analytics data directly to find specific, highly accurate, conclusions to business questions. The new version 4 of Google Analytics isn’t so great, just about every person I work with that uses it has complained about the new UI, data thresholding, among other inaccuracies and frustrations. I find myself running queries and building custom reports that feed Google Sheets or Data Studio to fill in these gaps. Here are some of the common queries I routinely use during the process.
In all the examples below, you’ll need to replace project.analytics_*.events_*
with your own GA4 BigQuery dataset. You’ll also want to adjust the WHERE
clause to return a specific date range of results.
Pageviews by Traffic Source
SELECT
event_date,
traffic_source.source as traffic_source,
COUNT(*) as pageviews
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE event_name = 'page_view'
AND _TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY 1, 2
ORDER BY 1 ASC, 3 DESC
Pageviews by URL
This query will provide a pageview count for all URLs tracked with your Google Analytics 4 profile. The URL is normalized to exclude all query parameters, and it’s also made lowercase to avoid redundancies. This granular level detail is one of the largest benefits of working with BigQuery instead of Google Analytics, you have precise control and accuracy.
SELECT
event_date,
LOWER( (
SELECT REGEXP_EXTRACT_ALL(value.string_value, r"([^?]+)")[
OFFSET (0)]
FROM UNNEST(event_params)
WHERE key = 'page_location'
) ) AS url,
COUNT(*) AS pageviews
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY 1, 2
ORDER BY 1 DESC, 3 DESC
Avoid Expensive Date Ranges
To run this query for a specific date range you should always use _TABLE_SUFFIX
which results in the most efficient query possible because of the way Google Analytics partitions it’s data. Using the table partitions is much better than doing something like:
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE event_date BETWEEN '20230801' AND '20230818'
Unique Pageviews
Unfortunately, Google Analytics 4 did away with the Unique Pageviews metric, but you can get the value using the following BigQuery SQL query. The date range is set to include the last 30 days here, but additional date ranges can be set (common examples below).
WITH pageviews AS (
SELECT
event_date,
(
SELECT REGEXP_EXTRACT_ALL(value.string_value, r"([^?]+)")[
OFFSET (0)]
FROM UNNEST(event_params)
WHERE key = 'page_location'
) as url,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY 1, 2, 3
)
SELECT
event_date,
url,
COUNT(ga_session_id) as unique_pageviews,
FROM pageviews
GROUP BY 1, 2
ORDER BY 1 ASC, 3 DESC
Users (All & New)
The following query will provide a single row with two columns:
all_users
is a count of all unique users that visited your site during the given date rangenew_users
is a count of all new users (first visits) that visited your site during the given date range
WITH users AS (
SELECT
event_date,
user_pseudo_id,
MAX(IF(event_name IN ('first_visit', 'first_open'), 1, 0)) AS is_new_user
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY 1, 2
)
SELECT
event_date,
COUNT(users) AS all_users,
SUM(is_new_user) AS new_users,
FROM users
GROUP BY 1
ORDER BY 1 ASC
Sessions
This query will give you a single and column containing the total number of unique sessions that occurred during the given time range.
WITH sessions AS (
SELECT
event_date,
(
SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'
) AS ga_session_id,
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY 1, 2
)
SELECT
event_date,
COUNT(sessions) AS sessions
FROM sessions
GROUP BY 1
ORDER BY 1 ASC
Conversion Events
Working with and measuring conversion events requires extracting the event_params
array that Google includes in the GA4 BigQuery schema.
Here’s an example of how to do that with the built-in GA4 form_submit
conversion event that returns the following columns:
event_timestamp
form_name
form_destination
form_id
url
SELECT
event_timestamp,
(
SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'form_name'
) AS form_name,
(
SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'form_destination'
) AS form_destination,
(
SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'form_id'
) AS form_id,
LOWER( (
SELECT REGEXP_EXTRACT_ALL(value.string_value, r"([^?]+)")[
OFFSET (0)]
FROM UNNEST(event_params)
WHERE key = 'page_location'
) ) AS url,
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE
event_name = 'form_submit'
AND _TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
Normalized URL’s
Google Analytics 4 stores the entire URL in BigQuery, which is useful for handling attribution but can cause issues with reporting pageviews. Normalizing the URL’s avoids this by stripping out all queries parameters from a URL when they exist.
The following SELECT
statement is what I use to extract a normalized URL without query parameters from the page_location
value from Google’s event_params
array.
(
SELECT REGEXP_EXTRACT_ALL(value.string_value, r"([^?]+)")[
OFFSET (0)]
FROM UNNEST(event_params)
WHERE key = 'page_location'
) as url,
Date Ranges
To restrict a query to a specific date range in BigQuery’s Google Analytics 4 dataset, you’ll use a WHERE
clause. Below are some common clauses that you can use to filter results for a specific date ranges.
Between Two Dates
To restrict a query to a specific date range, you can use the following WHERE
clause:
WHERE _TABLE_SUFFIX BETWEEN '20230801' AND '20230818'
Last 7 Days
The following WHERE
clause will limit results to the past 7 days:
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
Last 14 Days
Similarly, this WHERE
clause will do the same but for the last 14 days:
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
Last 30 Days
You should start to see a pattern here… This WHERE
clause will cover the last 30 days of data:
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
Any Number of Days
The same pattern can easily be adjusted to support any number of days in the past, just adjust the following portion:
INTERVAL 90 DAY
Suggestions?
If there’s a query you’re looking for that I should include on this, please let me know, I’d like to expand on this information as time goes on to keep it as useful as possible for the long-term.