Google Cloud GKE deployment for Maia runners

This document helps you understand Google Cloud-specific architecture decisions, deployment considerations, and readiness requirements for running s on Google Kubernetes Engine (GKE). GKE provides a managed Kubernetes control plane for running s in your Google Cloud infrastructure. This deployment model combines Google Cloud-native security features (Workload Identity) with Kubernetes operational flexibility. For complete Terraform modules, Helm charts, and step-by-step implementation instructions, see the GCP directory in the Matillion deployment library on GitHub. You should read the general Kubernetes deployment guide before reading this document.

What you get with GKE deployment

Managed Kubernetes control plane. Google handles the Kubernetes API server, etcd, and control plane upgrades.
Workload Identity. Credential-free authentication from pods to Google Cloud services using federated identity credentials.
Flexible node pools. Configurable machine types with autoscaling support.
Google Cloud integration. Native support for Cloud Monitoring, VPC networking, and Google Cloud Load Balancers.
Horizontal Pod Autoscaler. Scale pods based on metrics.
Cluster Autoscaler. Automatically adjust node capacity in node pools.

Prerequisites and readiness

Google Cloud requirements

Required Google Cloud services:

Google Kubernetes Engine API enabled in your target project.
Billing enabled for your GCP project.
Sufficient compute quotas for node pool VMs.
VPC with subnet configuration.

Your Google Cloud identity needs permissions to:

Create and manage GKE clusters and node pools.
Create and manage Compute Engine instances.
Create service accounts and IAM bindings (including Workload Identity bindings).
Manage VPC resources (subnets, routes, Cloud NAT, firewall rules).
Access Secret Manager (for storing OAuth credentials).
Create GCS buckets (for staging data).
Configure Cloud Logging and Cloud Monitoring.

We recommend you use a project Editor or Owner role for initial deployment, then scope down to least-privilege for ongoing operations.

Matillion account setup

Before deploying infrastructure, create a in the Matillion console. You need to obtain the following information about the you created:

Account ID: Your Matillion organization identifier.
Agent ID: The unique identifier for this (auto-generated).
OAuth Client ID and Secret: authentication credentials.
Region: us1 (United States), eu1 (Europe), or au1 (Australia/Asia-Pacific).

These credentials are required for the Helm deployment in Phase 4. Store them securely. For details, read Create a .

Required tools

Ensure these tools are installed and configured on your deployment workstation:

Terraform 1.0+ for infrastructure provisioning.
Google Cloud SDK (gcloud) configured with credentials (gcloud auth application-default login).
gke-gcloud-auth-plugin for kubectl authentication to GKE.
kubectl for Kubernetes cluster management.
Helm 3.x for application deployment.

Verify prerequisites:

# Verify Google Cloud authentication
gcloud auth list
gcloud config get-value project

# Verify tool versions
terraform --version
kubectl version --client
helm version
gke-gcloud-auth-plugin --version

Architecture decision points

Before deploying, make these key architectural decisions.

1. VPC and networking

The GKE Terraform module always creates a dedicated VPC. The module does not support attaching to an existing VPC. What gets created:

VPC with subnets and secondary IP ranges for GKE pods and services.
Cloud NAT (when enable_cloud_nat = true) for outbound internet access from private nodes.
Firewall rules for cluster communication.

GKE requires secondary IP ranges for pod and service IPs. The module configures these automatically—no manual subnet configuration is required.

2. Public vs private cluster

Setting	API server access	Use case
Public cluster	API server publicly accessible from authorized IP ranges.	Development, testing, faster initial setup.
Private cluster	API server accessible only from within VPC.	Production, enhanced security, requires Cloud NAT for node egress.

Set is_private_cluster = true or is_private_cluster = false in Terraform variables. See terraform.tfvars.example in the deployment library for an example. For private clusters, ensure:

Cloud NAT is enabled so private nodes can pull container images and reach external endpoints.
Deployment workstation has VPN or bastion access to VPC.
Authorized IP ranges include your access points.

3. Authentication strategy

Workload Identity (recommended):
- pods authenticate to Google Cloud APIs using federated identity credentials.
- No static credentials stored in the cluster.
- Automatic token rotation by Google Cloud STS.
- Best-practice security model for GKE.
Static OAuth credentials:
- OAuth credentials stored in Kubernetes Secrets.
- Use only if Workload Identity cannot be implemented (not recommended for GKE).

Recommendation: Use Workload Identity for all GKE deployments. The deployment library Terraform module creates the required service accounts and IAM bindings automatically.

4. Node pool strategy

Node machine type sizing:

Machine type	vCPU	Memory	Use case
`e2-standard-2`	2	8 GB	Development, testing, low workload.
`e2-standard-4`	4	16 GB	Small to medium production workloads.
`e2-standard-8`	8	32 GB	Production workloads.
`e2-standard-16`	16	64 GB	High-throughput production workloads.

Considerations:

Transformation-heavy workloads: SQL generation tasks, low CPU usage → Smaller machines sufficient.
Data ingestion/scripting workloads: High data transfer, processing on → Larger machines needed.
Pod density: Larger machines allow more pods per node, reducing operational overhead.

Configure machine type in Terraform node pool settings. Always select a machine one tier above the pod request to leave headroom for kubelet and system daemons.

5. Scaling strategy

Static replica count:

Fixed number of pods (for example, 2, 5, 10).
Predictable capacity and costs.
Suitable for steady-state workloads.

Horizontal Pod Autoscaler (HPA):

Automatically scales pods based on workload metrics.
Configure min/max replicas (for example, min: 2, max: 10).
Responds to workload spikes dynamically.

Cluster Autoscaler:

Automatically adds/removes nodes in the node pool based on pod scheduling needs.
Works in tandem with HPA.
Optimizes infrastructure costs.

Recommendation: start with static replicas, add HPA as you understand workload patterns.

Container images

images are available in Google Artifact Registry. Image repositories:

US region: us-docker.pkg.dev/maia-492711/maia-runners/maia-runner
EU region: europe-docker.pkg.dev/maia-492711/maia-runners/maia-runner
AU region: australia-southeast1-docker.pkg.dev/maia-492711/maia-runners/maia-runner

Available tags:

:stable—Slower release cycle, maximum stability, recommended for production.
:current—Faster release cycle, earlier access to new features.

Both tags are production-ready. Select :stable for stability-first deployments, or :current for early access to features. Select the repository that matches your Matillion region (us1 → US registry, eu1 → EU registry, au1 → AU registry) to minimize latency and egress costs.

Deployment journey

Expected timeline

Phase 1— registration: 10 minutes (Matillion console).
Phase 2—Infrastructure provisioning: 15-20 minutes (Terraform: VPC, GKE cluster, Workload Identity).
Phase 3—Configure kubectl access: 2 minutes (gcloud CLI + kubectl).
Phase 4— deployment: 5-10 minutes (Helm chart).
Phase 5—Validation: 15-30 minutes (pre-deployment checks + testing).

Total: 50-75 minutes for first-time deployment.

Phase 1: Maia runner registration (Matillion console)

Refer to Prerequisites, above, for details of creation. What you’ll have at the end:

Account ID.
Agent ID.
OAuth Client ID and Secret.
Region (us1, eu1, or au1).

Store these securely. You’ll need them for Helm deployment in Phase 4.

Phase 2: Infrastructure provisioning (Terraform)

The Terraform module creates:

GKE cluster:
- Managed Kubernetes control plane (API server, etcd, controller manager).
- GKE-managed upgrades and patching.
- Cloud Logging and Cloud Monitoring integration.
Node pool:
- Managed instance group with configurable machine types.
- Shielded nodes with Secure Boot enabled.
- Kubernetes node labels (if configured).
Workload Identity:
- GCP service account for workloads.
- IAM binding linking the GCP service account to the Kubernetes service account.
- Role assignments for Secret Manager and GCS access.
VPC and networking:
- VPC with subnets and secondary IP ranges for pods and services.
- Cloud NAT for outbound internet from private nodes.
- Firewall rules for cluster communication.
Supporting services:
- GCS bucket for staging storage.
- Secret Manager secret for credential storage.

In terraform.tfvars you will need to make these configuration changes:

project_id: Your GCP project ID.
region: GCP region (for example, us-central1, europe-west1, australia-southeast1).
name: Cluster name prefix (for example, matillion-runner).
desired_node_count: Initial node count (for example, 2).
machine_type: Node pool machine type (for example, e2-standard-4).
is_private_cluster: true or false.
master_ipv4_cidr_block: CIDR for the private cluster control plane (for example, 172.16.0.0/28).
authorized_ip_ranges: List of CIDRs allowed to access the API server.
enable_cloud_nat: true (required when is_private_cluster = true).
labels: Resource labels for cost allocation.

See terraform.tfvars.example in the deployment library for a complete example.

Why the three project-wide IAM grants?The service account is granted three project-wide IAM roles: roles/browser, roles/secretmanager.secretAccessor, and roles/secretmanager.viewer. Matillion models the GCP project as the “vault” for this , equivalent to an Azure Key Vault or a Snowpark schema in other providers. The UI therefore surfaces a GCP Project ID selector when users define a secret, which needs roles/browser to enumerate projects and the two Secret Manager roles to list and read secrets within them.

After terraform apply completes, retrieve the Terraform outputs using:

terraform output cluster_name
terraform output runner_workload_sa_email

The runner_workload_sa_email is required for Helm deployment in Phase 4.

If you see an “identity pool does not exist” error during terraform apply, run terraform apply a second time. This occurs because the GKE Workload Identity pool takes a moment to propagate after cluster creation.

Where to implement: GKE Terraform module.

Phase 3: Configure kubectl access

You must configure kubectl to authenticate to your GKE cluster using the Google Cloud SDK. The gcloud container clusters get-credentials command retrieves cluster endpoint and certificate authority data, then configures your local kubeconfig file with GKE authentication. Install the GKE authentication plugin (if not already installed):

gcloud components install gke-gcloud-auth-plugin

The command is:

gcloud container clusters get-credentials <cluster-name> \
  --region <region> \
  --project <project-id>

Use the cluster_name from Terraform output for <cluster-name>, and the same region and project_id from your Terraform variables.

If the plugin was just installed, add the Google Cloud SDK binary directory to your PATH before running the get-credentials command: export PATH="/opt/homebrew/share/google-cloud-sdk/bin:$PATH".

Verification:

kubectl get nodes
kubectl get namespaces

You should see GKE worker nodes and default Kubernetes namespaces.

Phase 4: Maia runner deployment (Helm)

The Helm chart deploys:

pods:
- Deployment with configurable replica count (default: 2).
- Each pod runs the binary.
- Resource requests and limits derived from runnerSize t-shirt sizing.
ServiceAccount:
- Kubernetes ServiceAccount configured for Workload Identity.
- Annotated with the GCP service account email (from Phase 2).
ConfigMaps:
- configuration (account ID, Agent ID, region, default GCP project).
- Environment-specific settings.
Secrets:
- OAuth Client ID and Secret for Matillion control plane authentication.
Service:
- Kubernetes service exposing Prometheus metrics endpoint (port 8080).
- Annotated for Prometheus service discovery.

You will provide the following configuration values:

Value	Source	Example
`cloudProvider`	Static	`"gcp"`
`runnerSize`	Your decision	`small` \| `medium` \| `large` \| `xlarge`
`config.oauthClientId`	Phase 1 (Matillion console)	`"abc123..."`
`config.oauthClientSecret`	Phase 1 (Matillion console)	`"secret456..."`
`dpcAgent.dpcAgent.env.accountId`	Phase 1 (Matillion console)	`"12345"`
`dpcAgent.dpcAgent.env.agentId`	Phase 1 (Matillion console)	`"agent-prod-01"`
`dpcAgent.dpcAgent.env.matillionRegion`	Phase 1 (Matillion console)	`"us1"`, `"eu1"`, or `"au1"`
`dpcAgent.dpcAgent.env.defaultGcpProject`	Your GCP project ID	`"your-gcp-project-id"`
`dpcAgent.dpcAgent.image.repository`	Region-specific registry	`"us-docker.pkg.dev/maia-492711/maia-runners/maia-runner"`
`dpcAgent.dpcAgent.image.tag`	Your decision	`"stable"` or `"current"`
`gcp.workloadIdentity.serviceAccountEmail`	Phase 2 Terraform output	`"runner-sa@project.iam.gserviceaccount.com"`
`dpcAgent.replicas`	Your decision	`2` (baseline) to `10+` (high throughput)

Note the following:

defaultGcpProject is the fallback vault used when a secret definition does not specify a project.
Additional GCP projects can be granted access so they also appear in the Matillion UI. See Phase 4a below if you need multi-project access.
For dev and preprod environments, set matillionEnv in the dpcAgent.dpcAgent.env block. Without it the defaults to prod, which will cause it to connect to the wrong environment.
The optional environment variables (proxyHttp, customCertLocation, etc.) must be explicitly set to empty strings if not used. Leaving them as template placeholders (for example, <CustomCertLocation>) causes the to exit immediately on startup with no further error.
The Helm release name and Kubernetes namespace must match. The Workload Identity binding in Terraform is created for {namespace}/{namespace}-sa. If the namespace differs from the release name, authentication will silently fail.

Where to implement:

Phase 4a (Optional): Granting access to additional GCP projects

In , each GCP project is a vault—a namespace for secrets. By default, the can only read secrets from the project set in defaultGcpProject. If your organization stores secrets in multiple GCP projects (for example, one per environment or team), you can grant the service account access to each of them. Once granted, those projects appear in the UI’s GCP Project ID drop-down alongside the default. In each additional GCP project, ensure the following APIs are enabled. The project owner can do this once:

gcloud services enable \
  secretmanager.googleapis.com \
  cloudresourcemanager.googleapis.com \
  --project=ADDITIONAL_PROJECT_ID

There are two deployment options. Select whichever option suits you.

Option 1: Terraform (recommended)

Add the additional project IDs to your terraform.tfvars:

additional_gcp_projects = [
  "second-project-id",
  "third-project-id"
]

Then run terraform apply. Terraform will automatically grant roles/secretmanager.secretAccessor, roles/secretmanager.viewer, and roles/browser to the service account in each listed project.

Option 2: Manual (gcloud)

If you prefer to grant access manually, replace RUNNER_SA with the value from the Terraform output runner_workload_sa_email (noted in Phase 2):

RUNNER_SA="your-runner-sa@primary-project.iam.gserviceaccount.com"
ADDITIONAL_PROJECT="second-project-id"
for ROLE in roles/secretmanager.secretAccessor roles/secretmanager.viewer roles/browser; do
  gcloud projects add-iam-policy-binding "$ADDITIONAL_PROJECT" \
    --member="serviceAccount:$RUNNER_SA" \
    --role="$ROLE"
done

Repeat for each additional project. The grants take effect immediately—no restart is required. After granting access, the additional projects appear in the GCP Project ID drop-down in the UI when users define a GCP secret. The defaultGcpProject remains the fallback when no project is explicitly selected.

Security considerationA single service account with access to multiple projects means that a compromise of that can affect secrets across all projects. For strict isolation between projects or teams, the recommended approach is to deploy separate instances, one per project, each with its own service account.

Phase 5: Validation and testing

Run automated pre-deployment validation scripts to verify pod environment:

# From deployment library root
./runner/helm/checks/run-check.sh --namespace matillion-runner --release matillion-runner

What gets checked:

Python 3 and Java runtime available.
Filesystem permissions correct.
Environment variables set (ACCOUNT_ID, AGENT_ID, etc.).
cgroup CPU and memory limits applied.
Network connectivity to Matillion control plane.
Security agents that might interfere (Crowdstrike, Prisma Cloud).

Manual verification:

Matillion Console: Navigate to Manage runners. Verify status shows “Connected”.
Test pipeline: Create a simple pipeline (for example, “Hello World” transformation) and execute.
Prometheus metrics: Verify metrics available at http://<pod-ip>:8080/actuator/prometheus.

application logs are available in Cloud Logging:

gcloud logging read \
  "resource.type=k8s_container AND resource.labels.cluster_name=<cluster-name>" \
  --project=<project-id> --limit=50

Maia runner architecture on GKE

Workload Identity

How Workload Identity works:

Kubernetes ServiceAccount is annotated with the GCP service account email.
GKE OIDC issuer allows Kubernetes to issue tokens trusted by Google Cloud IAM.
pod requests a Google Cloud access token using the projected service account token.
Google Cloud STS exchanges the token for short-lived access credentials (valid one hour, auto-refreshed).
accesses Google Cloud services (Secret Manager, GCS) without storing credentials.

Security benefits:

No long-lived credentials in cluster.
Automatic token rotation (every hour).
Least-privilege access (GCP service account scoped to specific resources).
Pod-level isolation (each pod authenticates independently).

What the Terraform module creates:

GCP service account for workloads.
Workload Identity IAM binding between the GCP service account and the Kubernetes service account.
Role assignments for Secret Manager (secretAccessor, viewer) and GCS.
GKE Workload Identity pool configuration on the cluster.

Pass the service account email from Terraform output:

terraform output -raw runner_workload_sa_email

Task capacity and throughput

Per-pod capacity: Each pod can execute up to 20 concurrent tasks. Throughput calculation: Maximum concurrent tasks = (Number of pods) × 20. Examples:

Two pods (default) = 40 concurrent tasks.
Five pods = 100 concurrent tasks.
10 pods = 200 concurrent tasks.

Scaling guidance:

For transformation workloads: Tasks generate SQL executed by data warehouse. CPU/memory usage is low. Fewer pods needed.
For data ingestion workloads: Tasks transfer and process data on . CPU/memory usage is high. More pods needed.

Queuing behavior: When all pods are at capacity (20 tasks each), new tasks queue in Matillion’s agent gateway until capacity becomes available.

Monitoring and observability

Native Prometheus metrics

pods expose Prometheus-compatible metrics at:

Endpoint: http://<pod-ip>:8080/actuator/prometheus.
Service: Automatically created by Helm chart with Prometheus annotations.

Key metrics:

app_version_info: version and build metadata.
app_agent_status: status (1 = running, 0 = stopped).
app_active_task_count: Current number of executing tasks.

The Helm chart includes annotations for automatic Prometheus service discovery:

prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/actuator/prometheus"

If Prometheus is deployed in your cluster, it will automatically discover and scrape these metrics.

Google Cloud Monitoring integration

Enable Cloud Monitoring Container Insights for comprehensive GKE monitoring:

Cluster-level metrics (CPU, memory, network).
Pod-level metrics (resource usage per pod).
Node-level metrics (VM health, disk usage).

application logs are streamed to Cloud Logging:

Centralized log aggregation across all pods.
Query with the Logs Explorer or gcloud CLI.
Set up log-based alerts on error patterns.

Recommended Cloud Monitoring alerts:

pod restarts > threshold.
pods in CrashLoopBackOff state.
Task execution failures (requires custom metric from logs).
Node pool CPU/memory > 80%.

Security best practices

Network security

VPC configuration:

Deploy pods in private node pools for enhanced security.
Use Cloud NAT for outbound internet access (required for Matillion control plane and image pulls when is_private_cluster = true).
Restrict firewall rules to minimum required ingress/egress.

Outbound connectivity requirements:

HTTPS (443) to Matillion control plane (region-specific endpoints).
HTTPS/JDBC to data warehouse endpoints (Snowflake, BigQuery).
HTTPS (443) to Google Cloud APIs (Secret Manager, GCS, IAM, Artifact Registry).
HTTP (80) to Snowflake endpoints.
Ingress: No inbound traffic required ( initiates all connections).

Private cluster considerations:

API server accessible only from VPC (or authorized VPN/bastion).
Requires VPN or Identity-Aware Proxy for kubectl access.
CI/CD pipelines need VPC connectivity or VPN access.

Pod security standards

The Helm chart implements Kubernetes pod security standards. Security context configuration:

Run as non-root user (UID 65534).
Read-only root filesystem.
No privilege escalation.
Drop all Linux capabilities.
Seccomp profile: RuntimeDefault.

Example from Helm chart:

securityContext:
  runAsNonRoot: true
  runAsUser: 65534
  fsGroup: 65534
  seccompProfile:
    type: RuntimeDefault

containers:
  - securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]

Secrets management

OAuth credentials storage options:

Google Cloud Secret Manager (recommended):
- Store OAuth credentials in Secret Manager.
- Use External Secrets Operator to sync to Kubernetes Secrets.
- Automatic rotation support.
- Centralized secret management across environments.
Kubernetes Secrets (default):
- Credentials provided via Helm values.
- Stored as base64-encoded Kubernetes Secret.
- Not encrypted at rest by default (enable GKE application-layer secret encryption with Cloud KMS).

Recommendation: For production, use Google Cloud Secret Manager with External Secrets Operator for centralized, auditable secret management.

Scaling considerations

When to scale

Indicators to add more pods:

Task queue depth consistently > 0 (check the Task history or metrics).
Pipeline execution time increases due to task queuing.
More concurrent pipelines being executed.
Workload characteristics change (more data ingestion vs transformation).

Indicators to keep current capacity:

Task queue depth consistently = 0.
Pipeline execution times stable.
Workload primarily transformation (SQL generation).

Horizontal Pod Autoscaler (HPA)

How it works:

Kubernetes HPA monitors pod metrics.
Automatically scales Deployment replicas within configured min/max range.
Evaluates every 15 seconds (default), scales up/down based on thresholds.

The HPA scales based on hpa.metrics.target.averageValue—the target number of in-flight tasks per pod.

Hard cap: 20. Each pod runs a maximum of 20 concurrent tasks. Values above 20 mean the HPA can never reach the target.
Recommended range: 15–17.
- 15—proactive (spiky or latency-sensitive workloads, more headroom).
- 16—balanced (recommended default).
- 17—reactive (steady workloads, some queueing acceptable, lower cost).

Configure via Helm values or separate HPA manifest. Read the HPA documentation for details.

Cluster autoscaler

How it works:

Monitors pods in Pending state (unable to schedule due to insufficient node capacity).
Automatically adds VMs to node pool (managed instance group).
Removes underutilized nodes after 10 minutes of low usage.

Works with HPA:

HPA scales pods based on metrics.
If pods can’t schedule (no node capacity), cluster autoscaler adds nodes.
pods schedule on new nodes.
When load decreases, HPA scales down pods, cluster autoscaler removes empty nodes.

Read the GKE Cluster Autoscaler documentation for more details.

Vertical scaling

Adjust CPU and memory limits per pod via the runnerSize Helm value:

Useful when individual tasks require more resources than current pod limits.
Requires pod restart to apply new resource limits.
Consider workload characteristics (transformation vs ingestion).

Cost optimization

Cost optimization strategies

Right-size nodes: Match machine type to workload (transformation-heavy = smaller, ingestion-heavy = larger).
Use Cluster Autoscaler: Automatically remove unused nodes during low-usage periods.
Consider Committed Use Discounts: For predictable baseline capacity (1-year or 3-year commitment).
Monitor data transfer: Ensure data warehouses in the same region to avoid cross-region egress charges.
Use Spot VMs for non-critical workloads: Configure node pool with Spot VMs for cost savings on interruptible workloads.

Troubleshooting

Maia runner exits immediately with no error (exit code 1)

The startup script treats unresolved placeholder values (for example, <CustomCertLocation>) as fatal. Set all optional environment variables to "" in your Helm values file if not used.

BeanCreationException: gcpSecretManager

If you see Error creating bean with name 'gcpSecretManager', ensure defaultGcpProject is set in your Helm values under dpcAgent.dpcAgent.env.defaultGcpProject.

Unknown Matillion region

Ensure matillionRegion uses the full region identifier: us1, eu1, or au1. Using a partial identifier (for example, eu without the digit) will fail.

Workload Identity binding—identity pool does not exist

This is a known race condition: the IAM binding is created before GKE finishes provisioning its Workload Identity pool. Run terraform apply a second time—it will succeed once the cluster is fully ready.

gke-gcloud-auth-plugin not found

After installing via gcloud components install gke-gcloud-auth-plugin, the plugin may not be on your PATH in the current shell session. Add the Google Cloud SDK binary directory to your PATH:

export PATH="/opt/homebrew/share/google-cloud-sdk/bin:$PATH"

Maia runner gateway connection unhealthy

If the starts successfully but logs show connection is unhealthy. lastKeepAliveTime=[null], this is expected until the OAuth client credentials and registration are correctly configured. The infrastructure and startup itself are healthy. This typically resolves once the OAuth credentials and registration are correctly set up.

Additional resources

Implementation and deployment

For complete Terraform modules, Helm charts, and step-by-step implementation, see the following in the Matillion Deployment Library on GitHub:

You can find the Matillion Deployment Library at github.com/matillion-public/deployment-library.

General Kubernetes guide

You should read the general Kubernetes deployment guide for platform-agnostic concepts and architecture.

Matillion documentation

For deployment models, read overview.
For registration, read Create a .
For capacity planning, read Scaling best practices.

Google Cloud documentation

For GKE concepts and operations, read Google Kubernetes Engine documentation.
For Workload Identity setup, read Use Workload Identity.
For automatic node scaling, read Cluster Autoscaler.

Platform overview

Setup

Getting started

Projects

Custom connectors

Runners

Observability

Git

Connections and credentials

Data ops

Troubleshooting

Videos

Documentation Index

​What you get with GKE deployment

​Prerequisites and readiness

​Google Cloud requirements

​Matillion account setup

​Required tools

​Architecture decision points

​1. VPC and networking

​2. Public vs private cluster

​3. Authentication strategy

​4. Node pool strategy

​5. Scaling strategy

​Container images

​Deployment journey

​Expected timeline

​Phase 1: Maia runner registration (Matillion console)

​Phase 2: Infrastructure provisioning (Terraform)

​Phase 3: Configure kubectl access

​Phase 4: Maia runner deployment (Helm)

​Phase 4a (Optional): Granting access to additional GCP projects

​Option 1: Terraform (recommended)

​Option 2: Manual (gcloud)

​Phase 5: Validation and testing

​Maia runner architecture on GKE

​Workload Identity

​Task capacity and throughput

​Monitoring and observability

​Native Prometheus metrics

​Google Cloud Monitoring integration

​Security best practices

​Network security

​Pod security standards

​Secrets management

​Scaling considerations

​When to scale

​Horizontal Pod Autoscaler (HPA)

​Cluster autoscaler

​Vertical scaling

​Cost optimization

​Cost optimization strategies

​Troubleshooting

​Maia runner exits immediately with no error (exit code 1)

​BeanCreationException: gcpSecretManager

​Unknown Matillion region

​Workload Identity binding—identity pool does not exist

​gke-gcloud-auth-plugin not found

​Maia runner gateway connection unhealthy

​Additional resources

​Implementation and deployment

​General Kubernetes guide

​Matillion documentation

​Google Cloud documentation

What you get with GKE deployment

Prerequisites and readiness

Google Cloud requirements

Matillion account setup

Required tools

Architecture decision points

1. VPC and networking

2. Public vs private cluster

3. Authentication strategy

4. Node pool strategy

5. Scaling strategy

Container images

Deployment journey

Expected timeline

Phase 1: Maia runner registration (Matillion console)

Phase 2: Infrastructure provisioning (Terraform)

Phase 3: Configure kubectl access

Phase 4: Maia runner deployment (Helm)

Phase 4a (Optional): Granting access to additional GCP projects

Option 1: Terraform (recommended)

Option 2: Manual (gcloud)

Phase 5: Validation and testing

Maia runner architecture on GKE

Workload Identity

Task capacity and throughput

Monitoring and observability

Native Prometheus metrics

Google Cloud Monitoring integration

Security best practices

Network security

Pod security standards

Secrets management

Scaling considerations

When to scale

Horizontal Pod Autoscaler (HPA)

Cluster autoscaler

Vertical scaling

Cost optimization

Cost optimization strategies

Troubleshooting

Maia runner exits immediately with no error (exit code 1)

BeanCreationException: gcpSecretManager

Unknown Matillion region

Workload Identity binding—identity pool does not exist

gke-gcloud-auth-plugin not found

Maia runner gateway connection unhealthy

Additional resources

Implementation and deployment

General Kubernetes guide

Matillion documentation

Google Cloud documentation