Google Search Console helps you understand how your website is doing in Google searches. It organizes data about things like how many people saw your site, clicked on it, and what search terms they used. By using this data in BigQuery, you can easily look at patterns, create special reports, and mix this data with other information to get a full picture of your website’s performance metrics.
Connecting Google Search Console to BigQuery
Known as Bulk Data Exports according to Google’s product lingo, there is a built-in way to connect your search console profile to BigQuery so that it will automatically sync daily data that you ca query directly.
To set this up you’ll need to enable the BigQuery API in your Google Cloud account, then configure a Bulk data export inside Search Console’s settings.
In Google Cloud go to the “Enable Data Transfer API” and enable the API.
After that, go to IAM in Google Cloud and provide search console with permissions to connect your BigQuery project by adding a new principle user for [email protected]
with the following permissions:
- BigQuery Job User
- BigQuery Data Editor
In your search console settings enter your BigQuery export destination details:
- Cloud project ID:
bigquery-project-id
- Dataset name:
searchconsole
- Dataset location:
United State (US)
Once you’ve done this click continue, then create data export to complete the process.
You won’t see your data appear for about 48 hours though, and if you look immediately you’ll seed some odd filler information in a temp_*
table. Don’t worry, once data is synced within the initial time frame that will all be replaced.
Search Console BigQuery Schema
Once your data has been imported into the BigQuery searchconsole
dataset you’ll have the following tables and columns available for querying.
searchdata_site_impression
Daily aggregated data of impressions and related details for your entire site as a whole. Each row represents the total daily values for a given property (or website profile).
- data_date: The day on which the data in this row was generated (Pacific Time).
- site_url: URL of the property. For domain-level properties, this will be sc-domain:property-name. For URL-prefix properties, it will be the full URL of the property definition. Examples: sc-domain:developers.google.com, https://developers.google.com/webmaster-tools/
- query: The user query. When is_anonymized_query is true, this will be a zero-length string.
- is_anonymized_query: Rare queries (called anonymized queries) are marked with this bool. The query field will be null when it’s true to protect the privacy of users making the query.
- country: Country from where the query was made, in ISO-3166-1-Alpha-3 format.
- search_type – One of the following string values:
- web: The default (“All”) tab in Google Search.
- image: The “Image” tab in Google Search.
- video: The “Video” tab in Google Search.
- news: The “News” tab in Google Search.
- discover: Discover results.
- googleNews: news.google.com and the Google News app on Android and iOS.
- device: The device from which the query was made.
- impressions: The number of impressions for this row.
- clicks: The number of clicks for this row.
- sum_top_position: The sum of the topmost position of the site in the search results for each impression in that table row, where zero is the top position in the results. To calculate average position (which is 1-based), calculate
SUM(sum_top_position)/SUM(impressions) + 1
.
searchdata_url_impression
Daily aggregated data of impressions and related details for individual URL in your website. Each row represents a day of data for a specific, unique URL on your website.
data_date
– The day on which the data in this row was generated (Pacific Time).site_url
– URL of the property. For domain-level properties, this will be sc-domain:property-name. For URL-prefix properties, it will be the full URL of the property definition. Examples: sc-domain:developers.google.com, https://developers.google.com/webmaster-tools/url
– The fully-qualified URL where the user eventually lands when they click the search result or Discover story.query
– Same as above.is_anonymized_query
– Same as aboveis_anonymized_discover
– Whether the data row is under the Discover anonymization threshold. When under the threshold, some other fields (like URL and country) will be missing to protect user privacy.country
– Same as above.search_type
– Same as above.device
– Same as above.is_[search_appearance_type]
– There are several boolean fields used to indicate search appearance type, such as is_amp_top_stories, is_job_listing, and is_job_details. A field will be true if the row in question appears for the specific rich result.impressions
– Same as above.clicks
– Same as above.sum_position
– A zero-based number indicating the topmost position of this URL in the search results for the query. (Zero is the top position in the results.) To calculate average position (which is 1-based), calculate SUM(sum_position)/SUM(impressions) + 1.
ExportLog
Records about the latest successful sync that was imported, containing the following fields:
agenda
– Which type of data was saved. Currently the only value isSEARCHDATA
.namespace
– Which table was saved to. Search Console typically exports data separately for each table, so each day typically has two export rows.data_date
– Date of the click or impression, in Pacific time. This is the partition date used by the tables.epoch_version
– An integer, where 0 is the first time data was saved to this table for this date. In the rare case that Search Console needs to go back and adjust previous data, perhaps because of a data anomaly, this value will be incremented by 1 each time the data is updated for that table and date.publish_time
– When the export completed in PST timezone
One import note is that failed exports won’t ever be recorded in the ExportLog
table.
More details on the schema is provided in Google’s Help Center.
Useful Queries
The following queries may be useful for exploring the data you now have access to in BigQuery. The real power comes from combining the information with other data sources though, these are really just meant to provide a basic overview of what you have available for querying.
Site-wide Daily Keyword Performance: Clicks & Impressions
Provides a list of all organic search keywords ranked by impression count, and ordered by search date.
SELECT
data_date as search_date,
IF(is_anonymized_query, 'longtail (anonymized)', query) as search_term,
SUM(impressions) as impressions,
SUM(clicks) as clicks
FROM `bigquery_project.searchconsole.searchdata_site_impression`
GROUP BY 1, 2
ORDER BY data_date DESC, impressions DESC
Daily Keyword Performance by URL: Clicks & Impressions
Provides a list of all landing page URLs in a site that are ranking organically on Google, including the search keywords they raked for, their impression and click counts. The results are ordered rom recent to oldest by search date.
SELECT
data_date as search_date,
url,
IF(is_anonymized_query, 'longtail (anonymized)', query) as search_term,
SUM(impressions) as impressions,
SUM(clicks) as clicks
FROM `bigquery_project.searchconsole.searchdata_url_impression`
GROUP BY 1, 2, 3
ORDER BY data_date DESC, impressions DESC
Top Performing Landing Pages (all time)
Provides a list of organically ranking landing page URLs ordered by SERP ranking beginning at zero as the top position, then ordered next by impressions in descending order.
SELECT
url,
IF(is_anonymized_query, 'longtail (anonymized)', query) as search_term,
SUM(impressions) as impressions,
MIN(sum_position) as serp_position
FROM `bigquery_project.searchconsole.searchdata_url_impression`
WHERE is_anonymized_query = false
GROUP BY 1, 2
ORDER BY serp_position ASC, impressions DESC
100 Most Valuable Landing Pages by Clicks & Impressions (last 30 days)
A list of the top 100 landing pages ordered by clicks and then impressions in descending order or the last 30 days.
SELECT
url,
query as search_term,
SUM(clicks) as clicks,
SUM(impressions) as impressions,
MIN(sum_position) as serp_position
FROM `bigquery_project.searchconsole.searchdata_url_impression`
WHERE data_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND is_anonymized_query = false
GROUP BY url, search_term
ORDER BY clicks DESC, impressions DESC
LIMIT 100
High Ranking Landing Pages
SELECT
query as search_term,
SUM(clicks) as clicks,
SUM(impressions) as impressions,
MIN(sum_position) as serp_position
FROM `bigquery_project.searchconsole.searchdata_url_impression`
WHERE data_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
AND is_anonymized_query = false
GROUP BY url, search_term
HAVING serp_position <= 10 AND impressions > 0
ORDER BY serp_position ASC, impressions DESC
Why Work with BigQuery Data?
Many of these things can be gathered manually using the search console UI, so you may be wondering why bother connecting to BigQuery. Having your search console data available in BigQuery provides a few critical benefits, and allows you to do precise things you would otherwise be unable to.
- Create custom reports and performance dashboards
- Connect organic rankings with other data sources to discover unique opportunities (Google Ads, GA4, and more)
- Include specific, automated organic SEO performance data in performance dashboards
- Historical data ownership
- Working with Ai through machine learning models
More information is provided by Google, but these are the highlights from my perspective. If you’re a business that is focused on search performance to achieve results, then connecting your Google Search Console profile to BigQuery with their bulk data export is something I would highly recommend exploring.