Skip to main content
Before you run any pipelines in , you need a connection to a suitable cloud data platform account. This topic discusses the basics of connecting to Databricks.

Full SaaS or Hybrid SaaS?

can be run in a Full SaaS or Hybrid SaaS architecture.
  • Databricks on AWS is compatible with both Full SaaS and Hybrid SaaS.
  • Databricks on Azure is compatible with both Full SaaS and Hybrid SaaS.

Compute types

supports many of the Databricks compute types. For more information, read Compute in the Databricks documentation.

Authentication to Databricks

supports Personal Access Token (PAT) authentication as well as OAuth for service principals (OAuth M2M) when connecting to Databricks. To use Personal Access Token (PAT) authentication in pipeline components, enter token as the username and the actual value of the token as the password. To use OAuth M2M authentication in pipeline components, select the OAuth Client Credentials option and enter the client ID and client secret. Note, if you are using a hybrid SaaS project your agent version should be 10.1021.1 or greater.

Catalog types

We recommend using Unity Catalog enabled workspaces. does support Hive catalogs, but many of its advanced features (such as Unity Catalog staging) and future features will be reliant on Unity Catalog workspaces.

Feature support

Some features will only work with specific Databricks runtimes and configurations:
FeatureMinimum Databricks runtimeNotes
Unity Catalog Volumes staging13.4+
Run Notebook10.4+If you are using a serverless SQL or classic SQL compute, you can only run SQL notebooks using the Run notebook component.

S3 buckets and Azure Blob storage

If you wish to load data from, or stage via, S3 buckets or Azure Blob storage, you must create and associate AWS or Azure cloud credentials to your environment. You should also make sure that the instance profile attached to your Databricks compute resources also has access to the same AWS or Azure storage.