Google Analytics Queries for  BigQuery

Useful BigQuery examples for querying Google Analytics 4 data

Google Analytics 4 can be directly connected to Google BigQuery, allowing us to work with the underlying data in our ways.

Once a connection is configured you can query a database of Google Analytics data directly to find specific, highly accurate, conclusions to business questions. The new version 4 of Google Analytics isn’t so great, just about every person I work with that uses it has complained about the new UI, data thresholding, among other inaccuracies and frustrations. I find myself running queries and building custom reports that feed Google Sheets or Data Studio to fill in these gaps. Here are some of the common queries I routinely use during the process.

In all the examples below, you’ll need to replace project.analytics_*.events_* with your own GA4 BigQuery dataset. You’ll also want to adjust the WHERE clause to return a specific date range of results.

Pageviews by Traffic Source

SELECT
  event_date,
  traffic_source.source as traffic_source,
  COUNT(*) as pageviews
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE event_name = 'page_view'
AND _TABLE_SUFFIX BETWEEN 
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY 1, 2
ORDER BY 1 ASC, 3 DESC

Pageviews by URL

This query will provide a pageview count for all URLs tracked with your Google Analytics 4 profile. The URL is normalized to exclude all query parameters, and it’s also made lowercase to avoid redundancies. This granular level detail is one of the largest benefits of working with BigQuery instead of Google Analytics, you have precise control and accuracy.

SELECT
  event_date,
  LOWER( (
    SELECT REGEXP_EXTRACT_ALL(value.string_value, r"([^?]+)")[
    OFFSET (0)]
    FROM UNNEST(event_params)
    WHERE key = 'page_location'
  ) ) AS url,
  COUNT(*) AS pageviews
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN 
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY 1, 2
ORDER BY 1 DESC, 3 DESC

Avoid Expensive Date Ranges

To run this query for a specific date range you should always use _TABLE_SUFFIX which results in the most efficient query possible because of the way Google Analytics partitions it’s data. Using the table partitions is much better than doing something like:

FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE event_date BETWEEN '20230801' AND '20230818'

Unique Pageviews

Unfortunately, Google Analytics 4 did away with the Unique Pageviews metric, but you can get the value using the following BigQuery SQL query. The date range is set to include the last 30 days here, but additional date ranges can be set (common examples below).

WITH pageviews AS (
  SELECT
    event_date,
    (
      SELECT REGEXP_EXTRACT_ALL(value.string_value, r"([^?]+)")[
      OFFSET (0)]
      FROM UNNEST(event_params)
      WHERE key = 'page_location'
    ) as url,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id
  FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN 
    FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
    FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  GROUP BY 1, 2, 3
)

SELECT
  event_date,
  url,
  COUNT(ga_session_id) as unique_pageviews,
FROM pageviews
GROUP BY 1, 2
ORDER BY 1 ASC, 3 DESC

Users (All & New)

The following query will provide a single row with two columns:

  • all_users is a count of all unique users that visited your site during the given date range
  • new_users is a count of all new users (first visits) that visited your site during the given date range
WITH users AS (
  SELECT
    event_date,
    user_pseudo_id,
    MAX(IF(event_name IN ('first_visit', 'first_open'), 1, 0)) AS is_new_user
  FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN 
    FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
    FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  GROUP BY 1, 2
)

SELECT
event_date,
  COUNT(users) AS all_users,
  SUM(is_new_user) AS new_users,
FROM users
GROUP BY 1
ORDER BY 1 ASC

Sessions

This query will give you a single and column containing the total number of unique sessions that occurred during the given time range.

WITH sessions AS (
  SELECT
    event_date,
    (
      SELECT value.int_value 
      FROM UNNEST(event_params) 
      WHERE key = 'ga_session_id'
    ) AS ga_session_id,
  FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN 
    FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
    FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  GROUP BY 1, 2
)

SELECT 
  event_date,
  COUNT(sessions) AS sessions
FROM sessions
GROUP BY 1
ORDER BY 1 ASC

Conversion Events

Working with and measuring conversion events requires extracting the event_params array that Google includes in the GA4 BigQuery schema.

Here’s an example of how to do that with the built-in GA4 form_submit conversion event that returns the following columns:

  • event_timestamp
  • form_name
  • form_destination
  • form_id
  • url
SELECT
  event_timestamp,
  (
    SELECT value.string_value
    FROM UNNEST(event_params)
    WHERE key = 'form_name'
  ) AS form_name,
  (
    SELECT value.string_value
    FROM UNNEST(event_params)
    WHERE key = 'form_destination'
  ) AS form_destination,
  (
    SELECT value.string_value
    FROM UNNEST(event_params)
    WHERE key = 'form_id'
  ) AS form_id,
  LOWER( (
    SELECT REGEXP_EXTRACT_ALL(value.string_value, r"([^?]+)")[
    OFFSET (0)]
    FROM UNNEST(event_params)
    WHERE key = 'page_location'
  ) ) AS url,
FROM `{BIGQUERY-PROJECT-ID}.analytics_{GA4-ID}.events_*`
WHERE
  event_name = 'form_submit'
  AND _TABLE_SUFFIX BETWEEN 
    FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
    FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

Normalized URL’s

Google Analytics 4 stores the entire URL in BigQuery, which is useful for handling attribution but can cause issues with reporting pageviews. Normalizing the URL’s avoids this by stripping out all queries parameters from a URL when they exist.

The following SELECT statement is what I use to extract a normalized URL without query parameters from the page_location value from Google’s event_params array.

(
  SELECT REGEXP_EXTRACT_ALL(value.string_value, r"([^?]+)")[
  OFFSET (0)]
  FROM UNNEST(event_params)
  WHERE key = 'page_location'
) as url,

Date Ranges

To restrict a query to a specific date range in BigQuery’s Google Analytics 4 dataset, you’ll use a WHERE clause. Below are some common clauses that you can use to filter results for a specific date ranges.

Between Two Dates

To restrict a query to a specific date range, you can use the following WHERE clause:

WHERE _TABLE_SUFFIX BETWEEN '20230801' AND '20230818'

Last 7 Days

The following WHERE clause will limit results to the past 7 days:

WHERE
  _TABLE_SUFFIX BETWEEN 
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

Last 14 Days

Similarly, this WHERE clause will do the same but for the last 14 days:

WHERE
  _TABLE_SUFFIX BETWEEN 
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 14 DAY)) AND
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

Last 30 Days

You should start to see a pattern here… This WHERE clause will cover the last 30 days of data:

WHERE
  _TABLE_SUFFIX BETWEEN 
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND
  FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

Any Number of Days

The same pattern can easily be adjusted to support any number of days in the past, just adjust the following portion:

INTERVAL 90 DAY

Suggestions?

If there’s a query you’re looking for that I should include on this, please let me know, I’d like to expand on this information as time goes on to keep it as useful as possible for the long-term.

Related Articles

Meet the Author

Kevin Leary, WordPress Consultant

I'm a custom WordPress web developer and analytics consultant in Boston, MA with 16 years of experience building websites and applications. View a portfolio of my work or request an estimate for your next project.