> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Project quickstart guide

export const m_runner = "Maia runner";

export const maia = "Maia";

export const designer = "Designer";

{maia} uses data pipelines to extract your data from diverse data sources and load that data into your chosen cloud data platform (warehouse). Therefore, before you run your data pipelines, you need a connection to a suitable cloud data platform account. These connections are housed in *projects*.

{maia} currently supports Snowflake, Databricks, Amazon Redshift, and Google BigQuery.

When you create a [project](/docs/guides/projects), you're setting up a workspace in {maia} that will group the following resources:

* **Branches** for version control and collaborative working.
* **Environment** connections to your cloud data platform.
* **Secret definitions** to reference secrets such as passwords, API keys, and so on.
* **Cloud credentials** to connect to objects and services on AWS, Azure, and Google Cloud.
* **OAuth** connections to access your data at third-party services such as Facebook, Salesforce, and so on.
* **Schedules** to run your pipelines at your preferred intervals.
* **Access** permissions for your project for other members in your {maia} organization.

***

## Create a project

When you create a project, set the following details in the corresponding fields:

* A **Project name**.
* An optional **Description** of your project.
* The **Data platform** you wish to connect to.

Click **Continue**.

***

## Select a project configuration

Select how you want your project to be configured:

* {maia} managed
* Advanced settings

For the purpose of this guide, select **{maia} managed**. Matillion will set up and manage the following infrastructure:

* Git repository
* Secrets
* {m_runner}s

<Note>
  Select **Advanced settings** if you want to configure and set up a third-party Git repository or deploy a Hybrid SaaS {m_runner}. For more information about project settings, read [Projects](/docs/guides/projects).
</Note>

Click **Continue**.

***

## Create an environment

Enter an environment name and use the drop-down to select an [access level](/docs/administration/environment-roles) for your **Default environment access**.

Click **Continue**.

From this point forward, use the tabs below to follow instructions specific to your cloud data platform.

***

## Specify data warehouse credentials

<Tabs>
  <Tab title="Snowflake">
    | Parameter | Description                                                                                                                                  |
    | --------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
    | Account   | Enter your Snowflake account name and region (part of the URL you use to log in to Snowflake). Uses the format \[accountName].\[region\_id]. |
    | Username  | Your Snowflake username.                                                                                                                     |
    | Password  | For **Full SaaS** deployment model only. Your Snowflake password.                                                                            |
  </Tab>

  <Tab title="Databricks">
    | Parameter     | Description                                                                                                                                                                       |
    | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Instance name | Your Databricks instance name. Read the [Databricks documentation](https://docs.databricks.com/en/workspace/workspace-details.html) to learn how to determine your instance name. |
    | Username      | Your Databricks username.                                                                                                                                                         |
    | Password      | Your Databricks password.                                                                                                                                                         |

    The instance name is the first part of the URL when you log in to your Databricks deployment. For example, `cust-success.cloud.databricks.com`. When you access Databricks through a browser, the instance name is clear in the URL in the address bar. For example, `https://cust-success.cloud.databricks.com/explore/data/hive_metastore` contains the instance address `cust-success.cloud.databricks.com`.

    We also recommend, as best practice, using a personal access token for authentication. Read [Databricks personal access token authentication](https://docs.databricks.com/en/dev-tools/auth/pat.html) for details. A personal access token is required to use a personal staging location (PSL), which is an option when selecting a **Staging** location in query components such as [Database Query](/docs/components/database-query).
  </Tab>

  <Tab title="Amazon Redshift">
    <Note>
      Before you create a [project](/docs/guides/projects) you are required to create [cloud provider credentials](/docs/guides/cloud-credentials) for Amazon Redshift, because a default S3 bucket is required when you want to stage data.
    </Note>

    Since Amazon Redshift is exclusively on AWS, the **Specify AWS cloud credentials** page will be displayed. [Cloud provider credentials](/docs/guides/cloud-credentials) are required so that a default S3 bucket can be selected on the next page. The default S3 bucket is mandatory for staging data, as mentioned earlier.

    | Parameter | Description                                                                                                                                                                                                                                                   |
    | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Endpoint  | The physical address of the leader node. This will be either a name or an IP address. For more information, read the [Amazon Redshift documentation](https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-connection-string.html).                     |
    | Port      | This is usually 5439 or 5432, but it can be configured differently when setting up your Amazon Redshift cluster. For more information, read the [Amazon Redshift documentation](https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-connecting.html). |
    | Use SSL   | Select this to encrypt communications between {maia} and Amazon Redshift. Some Amazon Redshift clusters may be configured to require this.                                                                                                                    |
    | Username  | The username for the environment connection.                                                                                                                                                                                                                  |
    | Password  | Your Amazon Redshift password.                                                                                                                                                                                                                                |
  </Tab>

  <Tab title="Google BigQuery">
    <Note>
      Before you create a [project](/docs/guides/projects) you are required to create [cloud provider credentials](/docs/guides/cloud-credentials) for Google BigQuery, because Google Cloud credentials are needed to authenticate with your BigQuery environment.
    </Note>

    Since Google BigQuery is exclusively on Google Cloud, the **Specify GCP cloud credentials** page will be displayed. [Cloud provider credentials](/docs/guides/cloud-credentials) are required so that your Google Cloud service account can be selected on the next page.

    | Parameter           | Description                                                                                               |
    | ------------------- | --------------------------------------------------------------------------------------------------------- |
    | Service account key | Your Google Cloud service account JSON key file, used to access Google BigQuery and Google Cloud Storage. |
  </Tab>
</Tabs>

Click **Add**.

***

## Select data warehouse defaults

<Tabs>
  <Tab title="Snowflake">
    | Property          | Description                                                                                                                                                                                  |
    | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Default role      | The default Snowflake role for this environment connection. Read [Overview of Access Control](https://docs.snowflake.com/en/user-guide/security-access-control-overview.html) to learn more. |
    | Default warehouse | The default Snowflake warehouse for this environment connection. Read [Overview of Warehouses](https://docs.snowflake.com/en/user-guide/warehouses-overview.html) to learn more.             |
    | Default database  | The default Snowflake database for this environment connection. Read [Database, Schema, and Share DDL](https://docs.snowflake.com/en/sql-reference/ddl-database.html) to learn more.         |
    | Default schema    | The default Snowflake schema for this environment connection. Read [Database, Schema, and Share DDL](https://docs.snowflake.com/en/sql-reference/ddl-database.html) to learn more.           |
  </Tab>

  <Tab title="Databricks">
    | Property          | Description                                                                                                              |
    | ----------------- | ------------------------------------------------------------------------------------------------------------------------ |
    | Compute           | The Databricks cluster that {maia} will connect to.                                                                      |
    | Catalog           | Choose a [Databricks Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/index.html) to connect to. |
    | Schema (Database) | Choose a Databricks schema (database) to connect to.                                                                     |

    The **Default compute** drop-down shows all clusters and SQL warehouses. The drop-down also shows the status (Running, Stopped, Starting, or Error) of each compute resource. Selecting a **Stopped** compute resource will trigger it to start, and change the displayed status to **Starting**. Starting a Databricks compute resource can take a few minutes, and during this time you won't be able to continue the configuration. Once the compute resource has started, you can select a catalog.

    The **Default catalog** drop-down shows both the **Hive\_metastore** as a top-level catalog, and catalogs governed by **Unity Catalog**. Matillion recommends the usage of Unity Catalog for access to our full suite of features. Read [Work with Unity Catalog and the legacy Hive metastore](https://docs.databricks.com/en/data-governance/unity-catalog/hive-metastore.html) for details.

    <Note>
      Databricks sometimes use the words **Schema** and **Database** interchangeably in their documentation. We always use the word **Schema** in component parameters.
    </Note>
  </Tab>

  <Tab title="Amazon Redshift">
    | Property         | Description                                                                                                                                                                         |
    | ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Default database | The database you created when setting up your Amazon Redshift cluster. You may run with multiple database names—in which case, choose the one you want to use for this environment. |
    | Default schema   | This is public by default, but if you have configured multiple schemas within your Amazon Redshift database, you should specify the schema you want to use.                         |
    | S3 bucket        | Your default Amazon S3 bucket.                                                                                                                                                      |
  </Tab>

  <Tab title="Google BigQuery">
    | Parameter                      | Description                                                                                                                                                                                                                                                                                                                                 |
    | ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | GCP Project ID                 | The unique, user-defined string identifier for your Google Cloud project that contains your BigQuery datasets and tables. You can find a project's ID in the Google Cloud Console on the dashboard. For more information, read the [Google Cloud documentation](https://cloud.google.com/resource-manager/docs/creating-managing-projects). |
    | Dataset                        | The BigQuery dataset for this environment connection. For more information, read the [BigQuery documentation](https://cloud.google.com/bigquery/docs/datasets-intro).                                                                                                                                                                       |
    | Allow inherit project defaults | Use this toggle to manage how the environment handles variable values. When enabled (default), the environment inherits project-level default values. If disabled, you must manually provide values for each variable to ensure pipelines function successfully across environments.                                                        |
  </Tab>
</Tabs>

Click **Finish** to create your project. You can now add a [branch](/docs/guides/branches) and begin building data [pipelines](/docs/guides/pipelines) in {designer}.
