> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Setup guide - Hybrid SaaS Databricks on AWS

export const m_runner = "Maia runner";

export const maia = "Maia";

This document describes the necessary steps to follow to set up your first working project in {maia} for the following configuration options:

<div class="metadata-grid" data-search-exclude>
  <div class="metadata-label">Deployment type:</div>

  <div class="metadata-value">
    <span class="cdp">Hybris SaaS</span>
  </div>

  <div class="metadata-label">Cloud platform:</div>

  <div class="metadata-value">
    <span class="cdp">AWS</span>
  </div>

  <div class="metadata-label">Cloud data warehouse:</div>

  <div class="metadata-value">
    <span class="cdp">Databricks</span>
  </div>
</div>

***

## Prerequisites

### AWS requirements

* An [AWS](https://aws.amazon.com/console/) account with privileges/permissions to use the **CloudFormation** template.
* An AWS user account or role with permissions to create:
  * [ECS clusters](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html).
  * [Task definitions](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html).
  * [IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) for task execution, including as a minimum `AWSServiceRoleForECS`. If your account doesn't have this role, create it following the instructions in [Creating a service-linked role for Amazon ECS](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using-service-linked-roles.html#create-slr).
  * [S3 buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingBucket.html).
  * [CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html) log groups.
  * [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/).
* Access to the following AWS resources:
  * A [virtual private cloud (VPC)](https://docs.aws.amazon.com/vpc/).
  * A [private subnet](https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html).
  * A [security group](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-security-groups.html), minimally allowing access.
* Allowed access to the IP addresses listed in [Network access and IP Allowlist requirements](/docs/security/network-access-and-ip-allowlist-requirements/#hybrid-saas-agents-and-git-repositories).

### Databricks requirements

* A [Databricks](https://www.databricks.com/) account with the following information:
  * Your Databricks [instance name](https://docs.databricks.com/aws/en/workspace/workspace-details).
  * Your Databricks [personal access token](https://docs.databricks.com/en/dev-tools/auth/pat.html).

### Connectivity requirements

* Access enabled for the IP addresses listed under the **Hybrid SaaS** section of [Network access and IP Allowlist requirements](/docs/security/network-access-and-ip-allowlist-requirements/#hybrid-saas-agents-and-git-repositories).

### Git requirements

If you choose to use [your own Git provider](/docs/guides/installing-git-provider-overview) instead of the Matillion-hosted Git option, you need the following:

* The Matillion Git app installed in your organization's account with one of the supported Git providers:
  * [GitHub](/docs/guides/installing-matillion-app-github-marketplace).
  * [Azure DevOps](/docs/guides/installing-matillion-app-azure-devops).
  * [GitLab](/docs/guides/connect-gitlab-repository-prerequisites).
  * [Bitbucket](/docs/guides/connect-bitbucket-repository-prerequisites).

***

## Setup steps

1. Register for a [{maia} account](/docs/administration/registration).
2. [Create accounts](/docs/administration/manage-accounts) for users and admins who will be active in {maia}.
3. [Create a {m_runner}](/docs/guides/create-a-runner) in {maia}.
4. [Deploy a Fargate {m_runner} in AWS](/docs/guides/runner-installation-cloudformation-quick-create#using-the-cloudformation-link) using CloudFormation.
   * If you have multiple VPCs, or link your VPCs to on-premises environments for accessing privately hosted databases, APIs, and other data sources, the {m_runner}'s VPC's CIDR and subnet's IP range should be compatible and properly linked with other networks, where required.
5. Create a [project](/docs/guides/projects#add-a-new-project), making the following choices:
   * Select **Advanced settings**.
   * Select the {m_runner} you created and deployed previously.
   * Select the Git provider you wish to use.
6. Create an [environment](/docs/guides/environments) using your Databricks credentials.
7. Set up [secret definitions](/docs/guides/secrets-and-secret-definitions#add-a-secret-definition-hybrid-saas) for Databricks credentials, passwords, API keys, and tokens.
8. Create a Git [branch](/docs/guides/branches) in which to begin pipeline work.
9. Create your first [pipeline](/docs/guides/pipelines).
