Restrict Access to Specific Datasets in  BigQuery

Limit access to show only specific data to certain users or groups in Google BigQuery.

Restricting access to datasets in BigQuery is a common request I see, but it’s not so straight forward to handle with Google Cloud services. The following approach is what I typically use to show specific datasets to specific users or groups.

Limited Access to Specific Views Only in BigQuery

Technical Steps Up Front

If you know what you’re doing, and you’re here for the technical specifics, here’s how it can be done:

  1. Project role set to BigQuery Job User
  2. Dataset role set to BigQuery Data Viewer
  3. Authorized Views added to any unshared or inaccessible table/view used in queries of what’s being shared

For more details, I’ll walk through the full approach below.

Common Use Cases

I commonly see this done for the following use cases:

  • Protect data containing PII or other sensitive customer information that should be kept private from analytics and analysis
  • Provide simplicity and clarity for internal team members, working with only the data they need makes the system self-documenting and less confusing to new users
  • Security to help protect key business information during a potential breach
  • Cost optimization to avoid large, expensive queries made by team members

The Approach

You can use the following approach to limit a set of users so that they only see specific data in BigQuery:

  1. Create a new dataset that will contain the views available to the group of users
  2. Create or recreate views in this dataset
  3. Add project and dataset level permissions for the users that can access this dataset

This approach will expose only views to a set of users, all tables are stored and maintained behind the scenes. It provides a good way to protect and isolate your true data sources, avoiding any potential corruption or accidental transformation.

Setup Steps

The following steps will make it happen:

1. Create your new dataset

Create a dataset in your project that will contain all the views available to this set of users. If needed you can create multiple datasets for organization or clarity, just repeat the following steps for all the datasets you create for this set of users (called principals in Google BigQuery).

2. Set project level user permissions

Go to the IAM management screen for your BigQuery project. We don’t need anything at the organization level, unless you’re using groups instead of individual principals. Add your users under the “VIEW BY PRINCIPALS” with the role BigQuery Job User.

BigQuery Project Level Permissions

3. Dataset Level Permissions

After settings the project level permissions in Google’s IAM you’ll need to set specific permissions for the newly created dataset(s) we added. Open the dataset(s) then click the SHARING link towards the top right of the screen and select Permissions from the dropdown menu. Click the ADD PRINCIPAL button in the Share permissions form that appears, then add the same users we just added at the project level (our BigQuery Job User roles), except this time set the role to BigQuery Data Viewer.

BigQuery Dataset Level Permissions

4. Authorized Views

At this point, our new users will see only these datasets when they log in to this BigQuery project, but they won’t be able to query anything in them yet. If you try to you’ll see the following error:

.language-txt
Access Denied: Table PROJECT:DATASET.VIEW: User does not have permission to query table PROJECT:DATASET.VIEW, or perhaps it does not exist in location US.

This is because the user doesn’t have access to the table queried by the view itself. To provide that access, you need to set up authorized views. Authorized views allow users to run queries without access to the underlying source data. It’s the final step needed to make this all work.

To set them up:

  1. Open up the dataset(s) containing the table queried by our views. If tables exist in multiple datasets, you’ll need to repeat this process for each one.
  2. Click the SHARING link and select Authorized Views in the dropdown menu
  3. Select the tables used by your views one-by-one, making sure to include all referenced tables

BigQuery Authorize Views

Once you’ve done this you’re done, you now have a set of users that can see and query a specific subset of datasets in your BigQuery data warehouse. All other data will be hidden from these users.

Data Warehouse Permissions Management

Why is this important? In general, permissions and access management is critical for a data warehouse for the following reasons:

Data Security

Data warehouses often contain sensitive information that could include personal customer details, financial records, intellectual property, or strategic business information. It’s crucial to ensure that only authorized personnel have access to this data to prevent leaks and breaches that could lead to financial loss, legal action, or damage to reputation.

Regulatory Compliance

Many industries are subject to regulations regarding the handling of data, such as the General Data Protection Regulation (GDPR) in Europe, Health Insurance Portability and Accountability Act (HIPAA) in the United States, and other privacy laws globally. Proper permissions and access management ensure compliance with these laws by controlling who has access to sensitive information and auditing data access.

Data Integrity

By controlling who can access and modify data, organizations can ensure the accuracy and reliability of the data in the warehouse. If too many people have the ability to change data, it can lead to inconsistencies, errors, and difficulties in tracking changes.

Least Privilege Principle

Access management is often governed by the principle of least privilege, which means users are granted only the access necessary to perform their job functions. This reduces the risk of accidental or deliberate misuse of data.

Auditing and Monitoring

With proper access controls, it becomes easier to monitor who is accessing what data and when. This is important not only for security, but also for maintaining an audit trail that can be invaluable during investigations or audits.

Data Privacy

Different customers and stakeholders may have differing expectations and legal rights regarding their data privacy. Managing permissions properly ensures that an organization can meet these expectations and uphold the rights of individuals.

Risk Management

Effective access management can mitigate risks related to accidental data deletion, unauthorized data manipulation, and leakage of proprietary information.

Resource Optimization

Access controls can also prevent unnecessary system load by restricting the number of queries and operations performed by users who do not need to perform data-intensive tasks.

Confidentiality and Trust

Maintaining strict access controls helps in building trust with customers, partners, and stakeholders by demonstrating that the company takes data confidentiality seriously.

Conclusion

You should now have a solid way to isolate data from users in BigQuery, exposing only what is needed to the right team members. If you need any assistance setting this up, or working with BigQuery to build or manage a data warehouse, I’m available for hire.

Related Articles

Meet the Author

Kevin Leary, WordPress Consultant

I'm a custom WordPress web developer and analytics consultant in Boston, MA with 16 years of experience building websites and applications. View a portfolio of my work or request an estimate for your next project.