Restricting access to datasets in BigQuery is a common request I see, but it’s not so straight forward to handle with Google Cloud services. The following approach is what I typically use to show specific datasets to specific users or groups.
Technical Steps Up Front
If you know what you’re doing, and you’re here for the technical specifics, here’s how it can be done:
- Project role set to
BigQuery Job User
- Dataset role set to
BigQuery Data Viewer
- Authorized Views added to any unshared or inaccessible table/view used in queries of what’s being shared
For more details, I’ll walk through the full approach below.
Common Use Cases
I commonly see this done for the following use cases:
- Protect data containing PII or other sensitive customer information that should be kept private from analytics and analysis
- Provide simplicity and clarity for internal team members, working with only the data they need makes the system self-documenting and less confusing to new users
- Security to help protect key business information during a potential breach
- Cost optimization to avoid large, expensive queries made by team members
You can use the following approach to limit a set of users so that they only see specific data in BigQuery:
- Create a new dataset that will contain the views available to the group of users
- Create or recreate views in this dataset
- Add project and dataset level permissions for the users that can access this dataset
This approach will expose only views to a set of users, all tables are stored and maintained behind the scenes. It provides a good way to protect and isolate your true data sources, avoiding any potential corruption or accidental transformation.
The following steps will make it happen:
1. Create your new dataset
Create a dataset in your project that will contain all the views available to this set of users. If needed you can create multiple datasets for organization or clarity, just repeat the following steps for all the datasets you create for this set of users (called principals in Google BigQuery).
2. Set project level user permissions
Go to the IAM management screen for your BigQuery project. We don’t need anything at the organization level, unless you’re using groups instead of individual principals. Add your users under the “VIEW BY PRINCIPALS” with the role
BigQuery Job User.
3. Dataset Level Permissions
After settings the project level permissions in Google’s IAM you’ll need to set specific permissions for the newly created dataset(s) we added. Open the dataset(s) then click the SHARING link towards the top right of the screen and select Permissions from the dropdown menu. Click the ADD PRINCIPAL button in the Share permissions form that appears, then add the same users we just added at the project level (our
BigQuery Job User roles), except this time set the role to
BigQuery Data Viewer.
4. Authorized Views
At this point, our new users will see only these datasets when they log in to this BigQuery project, but they won’t be able to query anything in them yet. If you try to you’ll see the following error:
Access Denied: Table PROJECT:DATASET.VIEW: User does not have permission to query table PROJECT:DATASET.VIEW, or perhaps it does not exist in location US.
This is because the user doesn’t have access to the table queried by the view itself. To provide that access, you need to set up authorized views. Authorized views allow users to run queries without access to the underlying source data. It’s the final step needed to make this all work.
To set them up:
- Open up the dataset(s) containing the table queried by our views. If tables exist in multiple datasets, you’ll need to repeat this process for each one.
- Click the SHARING link and select Authorized Views in the dropdown menu
- Select the tables used by your views one-by-one, making sure to include all referenced tables
Once you’ve done this you’re done, you now have a set of users that can see and query a specific subset of datasets in your BigQuery data warehouse. All other data will be hidden from these users.
Data Warehouse Permissions Management
Why is this important? In general, permissions and access management is critical for a data warehouse for the following reasons:
Data warehouses often contain sensitive information that could include personal customer details, financial records, intellectual property, or strategic business information. It’s crucial to ensure that only authorized personnel have access to this data to prevent leaks and breaches that could lead to financial loss, legal action, or damage to reputation.
Many industries are subject to regulations regarding the handling of data, such as the General Data Protection Regulation (GDPR) in Europe, Health Insurance Portability and Accountability Act (HIPAA) in the United States, and other privacy laws globally. Proper permissions and access management ensure compliance with these laws by controlling who has access to sensitive information and auditing data access.
By controlling who can access and modify data, organizations can ensure the accuracy and reliability of the data in the warehouse. If too many people have the ability to change data, it can lead to inconsistencies, errors, and difficulties in tracking changes.
Least Privilege Principle
Access management is often governed by the principle of least privilege, which means users are granted only the access necessary to perform their job functions. This reduces the risk of accidental or deliberate misuse of data.
Auditing and Monitoring
With proper access controls, it becomes easier to monitor who is accessing what data and when. This is important not only for security, but also for maintaining an audit trail that can be invaluable during investigations or audits.
Different customers and stakeholders may have differing expectations and legal rights regarding their data privacy. Managing permissions properly ensures that an organization can meet these expectations and uphold the rights of individuals.
Effective access management can mitigate risks related to accidental data deletion, unauthorized data manipulation, and leakage of proprietary information.
Access controls can also prevent unnecessary system load by restricting the number of queries and operations performed by users who do not need to perform data-intensive tasks.
Confidentiality and Trust
Maintaining strict access controls helps in building trust with customers, partners, and stakeholders by demonstrating that the company takes data confidentiality seriously.
You should now have a solid way to isolate data from users in BigQuery, exposing only what is needed to the right team members. If you need any assistance setting this up, or working with BigQuery to build or manage a data warehouse, I’m available for hire.