Setup guide - Hybrid SaaS BigQuery

This document describes the necessary steps to follow to set up your first working project in for the following configuration options:

Deployment type:

Hybrid SaaS

Cloud platform:

Google Cloud

Cloud data warehouse:

Google BigQuery

authenticates to Google BigQuery using a Google Cloud service account credential. Because Google BigQuery authentication differs from other warehouses, read How Google BigQuery authentication differs from other warehouses. For Matillion Full SaaS deployments, read the Matillion Full SaaS BigQuery setup guide.

How Google BigQuery authentication differs from other warehouses

For most warehouses, authentication is configured directly on the environment itself. For example, Snowflake environments typically use username/password or key-pair authentication configured as part of the warehouse connection. Google BigQuery doesn’t follow this model. Instead, Google BigQuery uses Google Cloud credentials. The Google Cloud service account acts as a principal when accessing Google Cloud resources. For more information, read the following Google Cloud documentation:

Because of this, Google BigQuery environments don’t contain warehouse authentication settings directly. To fully configure a Google BigQuery environment, it must have access to a Google Cloud service account credential. For Hybrid SaaS deployments, that credential can be supplied in one of two ways:

Runner-assigned credentials (default): When the environment doesn’t have an associated cloud credential, the uses Application Default Credentials (ADC). The Google Cloud service account attached to the acts as the principal when accessing Google BigQuery and other Google Cloud resources. For more information, read the Short-lived service account credentials section of Service account credentials.
Environment-associated cloud credential: A cloud credential is explicitly associated with the environment. authenticates to Google BigQuery as the Google Cloud service account that the cloud credential references — typically backed by a downloaded JSON service account key.

Most Google BigQuery customers run their on Google Cloud. Runner-assigned credentials specifically require the to be deployed on Google Cloud, since Application Default Credentials only resolve to a Google Cloud service account in that environment.

Example Google Cloud service account key

The following is an example of a Google Cloud service account key structure:

{
  "type": "service_account",
  "project_id": "example-project",
  "private_key_id": "1234567890abcdef1234567890abcdef12345678",
  "private_key": "-----BEGIN PRIVATE KEY-----\nEXAMPLEKEY\n-----END PRIVATE KEY-----\n",
  "client_email": "matillion-sa@example-project.iam.gserviceaccount.com",
  "client_id": "123456789012345678901",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/matillion-sa%40example-project.iam.gserviceaccount.com"
}

Prerequisites

Google Cloud requirements

A Google Cloud account with privileges to deploy and run a on Google Kubernetes Engine (GKE). For the full set of required APIs, IAM roles, and infrastructure prerequisites, read GCP IAM permissions for runner deployment, and the GKE deployment guide.

Google BigQuery requirements

A Google Cloud project with the following information:
- Your GCP project ID, found on the dashboard in the Google Cloud Console.
- A Google BigQuery dataset for to read from and write to.
- A Google Cloud service account for to authenticate as. If you want to use the ‘s own service account, no extra setup is needed — read GCP IAM permissions for runner deployment. Otherwise, configure a separate service account with its own Google Cloud service account key (JSON), and associate it with the environment as a cloud credential.
- IAM roles assigned to the authenticating Google Cloud service account that grant the permissions described in Permissions.

Connectivity requirements

Access enabled for the IP addresses listed under the Hybrid SaaS section of Network access and IP Allowlist requirements.

Git requirements

If you choose to use your own Git provider instead of the Matillion-hosted Git option, you need the following:

The Matillion Git app installed in your organization’s account with one of the supported Git providers:
- GitHub.
- Azure DevOps.
- GitLab.
- Bitbucket.

Permissions

The Google Cloud service account used by — whether runner-assigned or explicitly configured — must have IAM roles or permissions sufficient for the operations performs against your data. Typical operations include:

Create, update, and delete tables and views.
Query tables and views.
Retrieve metadata for datasets, tables, and views.
List projects, datasets, tables, and views.
Insert or load data into tables.
Run Google BigQuery jobs.

Recommended Google BigQuery roles

Depending on your use case, Google recommends assigning a combination of the following roles. At a minimum, grant either roles/bigquery.jobUser or roles/bigquery.user, as both include the bigquery.jobs.create permission required for the service account to interact with BigQuery. For -specific BigQuery IAM guidance, read GCP IAM permissions for runner deployment.

Role	Purpose
`roles/bigquery.jobUser`	Submit and run BigQuery jobs—must be project-level; can’t be scoped to a dataset.
`roles/bigquery.user`	Run BigQuery jobs, and query data.
`roles/bigquery.dataEditor`	Read and write data—only required if the pipeline writes back to BigQuery.
`roles/bigquery.dataViewer`	Read data from BigQuery datasets and tables.
`roles/bigquery.admin`	Full administrative access to Google BigQuery resources.

Use the principle of least privilege wherever possible.

For the full list of Google BigQuery IAM roles and permissions, read Access control.

Google Cloud Storage permissions

Many Google BigQuery workflows use Google Cloud Storage (GCS) as a staging location before loading data into Google BigQuery. If your pipelines interact with GCS buckets, the Google Cloud service account also requires appropriate Storage IAM permissions. For more information, read Basic roles. Commonly used roles include:

Role	Purpose
`roles/storage.objectViewer`	Read staged files.
`roles/storage.objectCreator`	Upload staged files.
`roles/storage.objectAdmin`	Full access to bucket objects.

For more information about IAM permissions, read Google Cloud IAM permissions for runner deployment.

Setup steps

Register for a account.
Create accounts for users and admins who will be active in .
Create a in .
Deploy a on GKE in your Google Cloud project.
- If you plan to use runner-assigned credentials for Google BigQuery access, grant the ‘s Google Cloud service account the Google BigQuery and Google Cloud Storage IAM roles described in Permissions.
Create a project, making the following choices:
- Select Advanced settings.
- Select the you created and deployed previously.
Create an environment, and configure it to use your Google Cloud service account key or runner-assigned credentials.
Select BigQuery defaults for your environment, such as the default GCP project and dataset.
Select your Git provider: Matillion-hosted Git, GitHub, Azure DevOps, GitLab, or Bitbucket.
Create a Git branch in which to begin pipeline work.
Create your first pipeline.

​How Google BigQuery authentication differs from other warehouses

​Example Google Cloud service account key

​Prerequisites

​Google Cloud requirements

​Google BigQuery requirements

​Connectivity requirements

​Git requirements

​Permissions

​Recommended Google BigQuery roles

​Google Cloud Storage permissions

​Setup steps

How Google BigQuery authentication differs from other warehouses

Example Google Cloud service account key

Prerequisites

Google Cloud requirements

Google BigQuery requirements

Connectivity requirements

Git requirements

Permissions

Recommended Google BigQuery roles

Google Cloud Storage permissions

Setup steps