This document describes the necessary steps to follow to set up your first working project in for the following configuration options:
authenticates to Google BigQuery using a Google Cloud service account credential. Because Google BigQuery authentication differs from other warehouses, read How Google BigQuery authentication differs from other warehouses. For Matillion Full SaaS deployments, read the Matillion Full SaaS BigQuery setup guide.
How Google BigQuery authentication differs from other warehouses
For most warehouses, authentication is configured directly on the environment itself. For example, Snowflake environments typically use username/password or key-pair authentication configured as part of the warehouse connection.
Google BigQuery doesn’t follow this model. Instead, Google BigQuery uses Google Cloud credentials. The Google Cloud service account acts as a principal when accessing Google Cloud resources. For more information, read the following Google Cloud documentation:
Because of this, Google BigQuery environments don’t contain warehouse authentication settings directly. To fully configure a Google BigQuery environment, it must have access to a Google Cloud service account credential. For Hybrid SaaS deployments, that credential can be supplied in one of two ways:
- Runner-assigned credentials (default): When the environment doesn’t have an associated cloud credential, the uses Application Default Credentials (ADC). The Google Cloud service account attached to the acts as the principal when accessing Google BigQuery and other Google Cloud resources. For more information, read the Short-lived service account credentials section of Service account credentials.
- Environment-associated cloud credential: A cloud credential is explicitly associated with the environment. authenticates to Google BigQuery as the Google Cloud service account that the cloud credential references — typically backed by a downloaded JSON service account key.
Most Google BigQuery customers run their on Google Cloud. Runner-assigned credentials specifically require the to be deployed on Google Cloud, since Application Default Credentials only resolve to a Google Cloud service account in that environment.
Example Google Cloud service account key
The following is an example of a Google Cloud service account key structure:
{
"type": "service_account",
"project_id": "example-project",
"private_key_id": "1234567890abcdef1234567890abcdef12345678",
"private_key": "-----BEGIN PRIVATE KEY-----\nEXAMPLEKEY\n-----END PRIVATE KEY-----\n",
"client_email": "matillion-sa@example-project.iam.gserviceaccount.com",
"client_id": "123456789012345678901",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/matillion-sa%40example-project.iam.gserviceaccount.com"
}
Prerequisites
Google Cloud requirements
Google BigQuery requirements
Connectivity requirements
Git requirements
If you choose to use your own Git provider instead of the Matillion-hosted Git option, you need the following:
- The Matillion Git app installed in your organization’s account with one of the supported Git providers:
Permissions
The Google Cloud service account used by — whether runner-assigned or explicitly configured — must have IAM roles or permissions sufficient for the operations performs against your data. Typical operations include:
- Create, update, and delete tables and views.
- Query tables and views.
- Retrieve metadata for datasets, tables, and views.
- List projects, datasets, tables, and views.
- Insert or load data into tables.
- Run Google BigQuery jobs.
Recommended Google BigQuery roles
Depending on your use case, Google recommends assigning a combination of the following roles. At a minimum, grant either roles/bigquery.jobUser or roles/bigquery.user, as both include the bigquery.jobs.create permission required for the service account to interact with BigQuery. For -specific BigQuery IAM guidance, read GCP IAM permissions for runner deployment.
| Role | Purpose |
|---|
roles/bigquery.jobUser | Submit and run BigQuery jobs—must be project-level; can’t be scoped to a dataset. |
roles/bigquery.user | Run BigQuery jobs, and query data. |
roles/bigquery.dataEditor | Read and write data—only required if the pipeline writes back to BigQuery. |
roles/bigquery.dataViewer | Read data from BigQuery datasets and tables. |
roles/bigquery.admin | Full administrative access to Google BigQuery resources. |
Use the principle of least privilege wherever possible.
For the full list of Google BigQuery IAM roles and permissions, read Access control.
Google Cloud Storage permissions
Many Google BigQuery workflows use Google Cloud Storage (GCS) as a staging location before loading data into Google BigQuery.
If your pipelines interact with GCS buckets, the Google Cloud service account also requires appropriate Storage IAM permissions. For more information, read Basic roles.
Commonly used roles include:
| Role | Purpose |
|---|
roles/storage.objectViewer | Read staged files. |
roles/storage.objectCreator | Upload staged files. |
roles/storage.objectAdmin | Full access to bucket objects. |
For more information about IAM permissions, read Google Cloud IAM permissions for runner deployment.
Setup steps
- Register for a account.
- Create accounts for users and admins who will be active in .
- Create a in .
- Deploy a on GKE in your Google Cloud project.
- If you plan to use runner-assigned credentials for Google BigQuery access, grant the ‘s Google Cloud service account the Google BigQuery and Google Cloud Storage IAM roles described in Permissions.
- Create a project, making the following choices:
- Select Advanced settings.
- Select the you created and deployed previously.
- Create an environment, and configure it to use your Google Cloud service account key or runner-assigned credentials.
- Select BigQuery defaults for your environment, such as the default GCP project and dataset.
- Select your Git provider: Matillion-hosted Git, GitHub, Azure DevOps, GitLab, or Bitbucket.
- Create a Git branch in which to begin pipeline work.
- Create your first pipeline.