This page describes how to configure the connector component as part of a data pipeline within . The component uses the Connect and Configure parameters to create a table of data, which is then stored in your preferred storage location (Snowflake, Databricks, Amazon Redshift, or cloud storage). You do not need to use the Create Table component when using this connector, as the component will create a new table or replace an existing table for you using the Destination parameters you define.
The connector is a Flex connector. In , Flex connectors let you connect to a curated set of endpoints to load data.
You can use connector in its preconfigured state, or you can edit the connector by adding or amending available endpoints as per your use case. You can edit Flex connectors in the Custom Connector user interface.
For detailed information about authentication, specific endpoint parameters, pagination, and more aspects of the API, read the API documentation.
If your account uses IP restrictions and you’re running this connector on a Hybrid SaaS , you must add your ‘s outbound IP address or range to the IP allow list. For more information, read IP allow list.
Video example
Watch our video about using Flex connector: YouTube.
Properties
Reference material is provided below for the Connect, Configure, Destination, and Advanced Settings properties.
A human-readable name for the component.
Connect
The data source to load data from in this pipeline. The drop-down menu lists the API endpoints available in the connector. For detailed information about specific endpoints, read the API documentation.
The authentication method to authorize access to your data. Currently supports OAuth 2.0 Client Credentials. Read Authenticating to the API to learn more.
- Parameter Name: The name of a URI parameter.
- Parameter Value: The value of the corresponding parameter.
| Required parameter | Endpoints | Description |
|---|
| server | All endpoints | Enter eu1 or us1, depending on the region of your account. To find your account’s region, click the Profile & Account icon on the left side of the page. |
| api_version | All endpoints | Enter v1. |
| projectId | List All Environments, List All Published Pipelines, Execute Published Pipeline, Pipeline Execution Details, Pipeline Steps Status, List All Schedules, Create Schedule, List Artifacts, Get Artifact, Promote Artifact, List All Secret References, Create Secret Reference, Create Test Execution, Test Execution Status | projectId is unique to every project. Retrieve this value by using the List All Projects endpoint. |
| pipelineExecutionId | Pipeline Execution Details, Pipeline Steps Status | pipelineExecutionId is unique to every pipeline. Retrieve this value by using the Execute Published Pipeline endpoint. |
| secretReferenceName | Create Secret Reference | The name of secret reference. This can be found in the Secret definitions tab in , under the Name column. |
| agentId | Get Agent Details, Trigger Agent Command, Get Agent Client Credentials, Perform Action On Agent Credentials | The ID of the to retrieve. To find this, in the left navigation, click , then Runners. Select the intended , and click the Parameters tab. |
- Parameter Name: The name of a query parameter.
- Parameter Value: The value of the corresponding parameter.
| Required parameter | Endpoints | Description |
|---|
| size | List All Projects, List All Environments, List All Published Pipelines, Get Pipeline Steps Status, List All Schedules, List Artifacts, List All Secret References, List All Agents, Query Audit Events, Get Lineage Events | Enter the number of records per page, ranging from 1 to 100. |
| environmentName | List All Published Pipelines, List Artifacts, Get Artifact, Create Test Execution | Enter the environment name. For example, test-environment-1. |
| consumedFrom | Get Flat-Rated Product Consumption | Enter the start date for the results. This value is inclusive, meaning results from this date onward are included. For example, 2024-11-01. |
| consumedBefore | Get Flat-Rated Product Consumption | Enter the end date for the results. This value is exclusive, meaning it includes only results occurring before (but not on) this date. For example, 2024-12-01. |
| consumedFrom | Get ETL Users Consumption | Enter the start date and time for the results. This value is inclusive, meaning results from this date and time onward are included. For example, 2024-07-01T00:00:00.123Z. |
| consumedBefore | Get ETL Users Consumption | Enter the end date and time for the results. This value is exclusive, meaning it includes only results occurring before (but not on) this date and time. For example,2024-07-31T00:00:00.123Z. |
| versionName | Get Artifact | The Version name when you Push local changes to the remote repository in . For more information, read Git push local changes. |
| limit | Pipeline Executions | Enter the maximum number of results to return. The default value is set to 25. |
| from | Query Audit Events | Enter the earliest date and time of audit events to retrieve. The date time format must be in ISO 8601 format, for example: 2025-02-20T07:15:15.000-01:00. |
| to | Query Audit Events | Enter the latest date and time of audit events to retrieve. The date time format must be in ISO 8601 format, for example: 2025-02-21T07:15:15.000-01:00. |
| generatedFrom | Get Lineage Events | Include events generated on or after this date time. The value must be earlier than generatedBefore. |
| generatedBefore | Get Lineage Events | Include events generated up to, but not including, this date time. The value must be later than generatedFrom. |
| page | Get Lineage Events | The page number to use for pagination, starting at 0. Must be 0 or greater. |
The Get ETL Users Consumption endpoint provides information about the number of credits charged for users, and identifies which users contributed to those charges. users are billed based on monthly active unique users, so ensure that the consumedFrom and consumedBefore parameters correspond to the timeframe of a single monthly invoice.
- Parameter Name: The name of a header parameter.
- Parameter Value: The value of the corresponding parameter.
| Required parameter | Endpoints | Description |
|---|
| Content-Type | Execute Published Pipeline, Pipeline Execution Status, Pipeline Steps Status, Create Schedule, Promote Artifact, Create Secret Reference, List All Agents, Create Agent, Get Agent Details, Trigger Agent Command, Get Agent Client Credentials, Perform Action On Agent Credentials, Query Audit Events, Get Lineage Events | Enter application/json. |
| accept | Execute Published Pipeline, Pipeline Execution Details, Pipeline Steps Status | Enter application/json. |
A JSON body to include as part of a POST request. Use Custom Connector to test your endpoints work as expected before moving to pipelines.You should also consult the developer documentation for the API you’re connecting to—as the developer portal may provide additional information about endpoints and requests.For the Execute Published Pipeline endpoint, include the following POST Body. This example demonstrates a POST Body used to execute a pipeline:{
"pipelineName": "test-pipeline",
"environmentName": "test-environment"
}
For the Create Schedule endpoint, include the following POST Body. This example demonstrates a POST Body used to create a schedule:{
"pipeline": {
"pipelineName": "pipeline-name",
"environmentName": "environment-name"
},
"schedule": {
"cronExpression": "0 * * ? * * *",
"cronTimezone": "Europe/Dublin",
"effectiveFrom": "2022-05-19T12:37:44Z",
"name": "schedule-name",
"scheduleEnabled": false
}
}
For the Promote Artifact endpoint, include the following POST Body. This example demonstrates a POST Body used to promote an artifact:{
"sourceEnvironmentName": "source-environment-name",
"targetEnvironmentName": "environment-name",
"versionName": "version-name"
}
For the Create Secret Reference endpoint, include the following POST Body. This example demonstrates a POST Body used to create a secret for an AWS :{
"agentId": "agent-id",
"agentType": "AWS",
"description": "My secret reference",
"secretReferenceType": "PASSWORD",
"secretKey": "aws-secret-key",
"secretName": "aws-secret-name"
}
For the Create Agent endpoint, include the following POST Body. This example demonstrates a POST Body used to create a new AWS :{
"agentType": "data_productivity_cloud",
"cloudProvider": "aws",
"deployment": "fargate",
"description": "An AWS Agent",
"enableAutoUpdates": true,
"name": "AWS Agent",
"restrictedAccess": true,
"trackName": "current"
}
For the Trigger Agent Command endpoint, include the following POST Body. This example demonstrates a POST Body used to trigger the RESTART command:Other available commands are:For the Perform Action On Agent Credentials endpoint, include the following POST Body. This example demonstrates a POST Body used to refresh credentials:{
"action": "SECRET_REFRESH"
}
For other s, the POST body will vary. For example, in Azure, you must specify a value for vaultName.
A numeric value to limit the maximum number of records per page.
Destination
Select your cloud data warehouse.
Snowflake
Databricks
Amazon Redshift
Google BigQuery
- Snowflake: Load your data into Snowflake. You’ll need to set a cloud storage location for temporary staging of the data.
- Cloud Storage: Load your data directly into your preferred cloud storage location.
Click either the Snowflake or Cloud Storage tab on this page for documentation applicable to that destination type.The Snowflake warehouse used to run the queries. The special value [Environment Default] uses the warehouse defined in the environment. Read Overview of Warehouses to learn more. The Snowflake schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more. The name of the table to be created.
- Replace: If the specified table name already exists, that table will be destroyed and replaced by the table created during this pipeline run.
- Truncate and Insert: Each time the pipeline runs, two operations are performed: first, the table is truncated, meaning all existing rows are deleted. Then, your new rows are inserted. The table itself is never destroyed and recreated.
- Fail if Exists: If the specified table name already exists, this pipeline will fail to run.
- Append: If the specified table name already exists, then the data is inserted without altering or deleting the existing data in the table. It’s appended onto the end of the existing data in the table. If the specified table name doesn’t exist, then the table will be created, and your data will be inserted into the table.
- Yes: Staged files will be destroyed after data is loaded. This is the default setting.
- No: Staged files are retained in the staging area after data is loaded.
Select the stage access strategy. The strategies available depend on the cloud platform you select in Stage Platform.
- Credentials: Connects to the external stage (AWS, Azure) using your configured cloud provider credentials. Not available for Google Cloud Storage.
- Storage Integration: Use a Snowflake storage integration to grant access to Snowflake to read data from and write to a cloud storage location. This will reveal the Storage Integration property, through which you can select any of your existing Snowflake storage integrations.
Choose a data staging platform using the drop-down menu.
- Amazon S3: Stage your data on an AWS S3 bucket.
- Snowflake: Stage your data on a Snowflake internal stage.
- Azure Storage: Stage your data in an Azure Blob Storage container.
- Google Cloud Storage: Stage your data in a Google Cloud Storage bucket.
Click one of the tabs below for documentation applicable to that staging platform. Amazon S3
Snowflake
Azure Storage
Google Cloud Storage
Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them. Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage, regardless of the cloud provider that hosts your Snowflake account.
Select the Snowflake internal stage type. Use the Snowflake links provided to learn more about each type of stage.
- User: Each Snowflake user has a user stage allocated to them by default for file storage. You may find the user stage convenient if your files will only be accessed by a single user, but need to be copied into multiple tables.
- Named: A named stage provides high flexibility for data loading. Users with the appropriate privileges on the stage can load data into any table. Furthermore, because the stage is a database object, any security or access rules that apply to all objects will apply to the named stage.
Named stages can be altered and dropped. User stages cannot. Select your named stage. Read Creating a named stage to learn how to create a new named stage.There is a known issue where named stages that include special characters or spaces are not supported.
Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them. Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage, regardless of the cloud provider that hosts your Snowflake account.
Select a storage account linked to your desired blob container to be used for staging the data. For more information, read Storage account overview. Select the stage access strategy. The strategies available depend on the cloud platform you select in Stage Platform.
- Storage Integration: Use a Snowflake storage integration to grant access to Snowflake to read data from and write to a cloud storage location. This will reveal the Storage Integration property, through which you can select any of your existing Snowflake storage integrations.
Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them. Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage, regardless of the cloud provider that hosts your Snowflake account.
Select whether to overwrite files of the same name when this pipeline runs. Default is Yes.
- Append Files in Folder: Appends files to storage folder. This is the default setting.
- Overwrite Files in Folder: Overwrite existing files with matching structure.
See the configuration table for how this parameter works with the Folder Path and File Prefix parameters:| Configuration | Description |
|---|
| Append files in folder with defined folder path and file prefix. | Files will be stored under the structure uniqueID/timestamp-partX where X is the part number, starting from 1. |
| Append files in folder without defined folder path and file prefix. | Files will be stored under the structure folder/prefix-timestamp-partX where X is the part number, starting from 1. |
| Overwrite files in folder with defined folder path and file prefix. | Files will be stored under the structure folder/prefix-partX where X is the part number, starting from 1. All files with matching structures will be overwritten. |
| Overwrite files in folder without defined folder path and file prefix. | Validation will fail. Folder path and file prefix must be supplied for this load strategy. |
The folder path of the written files.
A string of characters to include at the beginning of the written files. Often used for organizing database objects.
A cloud storage location to load your data into for storage. Choose either Amazon S3, Azure Storage, or Google Cloud Storage.Click the tab that corresponds to your chosen cloud storage service.
Amazon S3
Azure Storage
Google Cloud Storage
Select a storage account linked to your desired blob container to be used for staging the data. For more information, read Storage account overview. Select whether to overwrite files of the same name when this pipeline runs. Default is Yes.
- Databricks: Load your data into Databricks. You’ll need to set a cloud storage location for temporary staging of the data.
- Cloud Storage: Load your data directly into your preferred cloud storage location.
Click either the Databricks or Cloud Storage tab on this page for documentation applicable to that destination type.Select a Databricks Unity Catalog. The special value [Environment Default] uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter. The Databricks schema. The special value [Environment Default] uses the schema defined in the environment. Read Create and manage schemas to learn more. The name of the table to be created.
- Fail if Exists: If the specified table name already exists, this pipeline will fail to run.
- Replace: If the specified table name already exists, that table will be destroyed and replaced by the table created during this pipeline run.
- Truncate and Insert: Each time the pipeline runs, two operations are performed: first, the table is truncated, meaning all existing rows are deleted. Then, your new rows are inserted. The table itself is never destroyed and recreated.
- Append: If the specified table name already exists, then the data is inserted without altering or deleting the existing data in the table. It’s appended onto the end of the existing data in the table. If the specified table name doesn’t exist, then the table will be created, and your data will be inserted into the table.
- Yes: Staged files will be destroyed after data is loaded. This is the default setting.
- No: Staged files are retained in the staging area after data is loaded.
Choose a data staging platform using the drop-down menu.
- Amazon S3: Stage your data on an AWS S3 bucket.
- Azure Storage: Stage your data in an Azure Blob Storage container.
Click one of the tabs below for documentation applicable to that staging platform. Select a storage account linked to your desired blob container to be used for staging the data. For more information, read Storage account overview.
- Append Files in Folder: Appends files to storage folder. This is the default setting.
- Overwrite Files in Folder: Overwrite existing files with matching structure.
See the configuration table for how this parameter works with the Folder Path and File Prefix parameters:| Configuration | Description |
|---|
| Append files in folder with defined folder path and file prefix. | Files will be stored under the structure uniqueID/timestamp-partX where X is the part number, starting from 1. |
| Append files in folder without defined folder path and file prefix. | Files will be stored under the structure folder/prefix-timestamp-partX where X is the part number, starting from 1. |
| Overwrite files in folder with defined folder path and file prefix. | Files will be stored under the structure folder/prefix-partX where X is the part number, starting from 1. All files with matching structures will be overwritten. |
| Overwrite files in folder without defined folder path and file prefix. | Validation will fail. Folder path and file prefix must be supplied for this load strategy. |
The folder path of the written files.
A string of characters to include at the beginning of the written files. Often used for organizing database objects.
A cloud storage location to load your data into for storage. Choose either Amazon S3, Azure Storage, or Google Cloud Storage.Click the tab that corresponds to your chosen cloud storage service.
Amazon S3
Azure Storage
Google Cloud Storage
Select a storage account linked to your desired blob container to be used for staging the data. For more information, read Storage account overview. Select whether to overwrite files of the same name when this pipeline runs. Default is Yes.
- Redshift: Load your data into Amazon Redshift. You’ll need to set a cloud storage location for temporary staging of the data.
- Cloud Storage: Load your data directly into your preferred cloud storage location.
Click either the Amazon Redshift or Cloud Storage tab on this page for documentation applicable to that destination type. Amazon Redshift
Cloud Storage
The Amazon Redshift schema. The special value [Environment Default] uses the schema defined in the environment. Read Schemas to learn more.For more information on using multiple schemas, read Schemas. The name of the table to be created.
- Replace: If the specified table name already exists, that table will be destroyed and replaced by the table created during this pipeline run.
- Fail if Exists: If the specified table name already exists, this pipeline will fail to run.
- Truncate and Insert: Each time the pipeline runs, two operations are performed: first, the table is truncated, meaning all existing rows are deleted. Then, your new rows are inserted. The table itself is never destroyed and recreated.
- Append: If the specified table name already exists, then the data is inserted without altering or deleting the existing data in the table. It’s appended onto the end of the existing data in the table. If the specified table name doesn’t exist, then the table will be created, and your data will be inserted into the table.
- Yes: Staged files will be destroyed after data is loaded. This is the default setting.
- No: Staged files are retained in the staging area after data is loaded.
- Append Files in Folder: Appends files to storage folder. This is the default setting.
- Overwrite Files in Folder: Overwrite existing files with matching structure.
See the configuration table for how this parameter works with the Folder Path and File Prefix parameters:| Configuration | Description |
|---|
| Append files in folder with defined folder path and file prefix. | Files will be stored under the structure uniqueID/timestamp-partX where X is the part number, starting from 1. |
| Append files in folder without defined folder path and file prefix. | Files will be stored under the structure folder/prefix-timestamp-partX where X is the part number, starting from 1. |
| Overwrite files in folder with defined folder path and file prefix. | Files will be stored under the structure folder/prefix-partX where X is the part number, starting from 1. All files with matching structures will be overwritten. |
| Overwrite files in folder without defined folder path and file prefix. | Validation will fail. Folder path and file prefix must be supplied for this load strategy. |
The folder path of the written files.
A string of characters to include at the beginning of the written files. Often used for organizing database objects.
A cloud storage location to load your data into for storage. Choose either Amazon S3, Azure Storage, or Google Cloud Storage.Click the tab that corresponds to your chosen cloud storage service.
Amazon S3
Azure Storage
Google Cloud Storage
Select a storage account linked to your desired blob container to be used for staging the data. For more information, read Storage account overview. Select whether to overwrite files of the same name when this pipeline runs. Default is Yes.
Select the destination for your data:
- Google BigQuery: Load your data into a table in Google BigQuery.
- Cloud Storage: Load your data directly into files in your preferred cloud storage location. The format of these files can differ between source systems and will not have a file extension. Check the output to determine the format of the data.
Google BigQuery
Cloud Storage
Select the Google BigQuery project to load data into. The special value [Environment Default] uses the project defined in the environment.
Select the Google BigQuery dataset to load data into. The special value [Environment Default] uses the dataset defined in the environment.
The name of the table to be created in your Google BigQuery project. You can use a Table Input component in a transformation pipeline to access and transform this data after it has been loaded.
- Replace: If the specified table name already exists, that table will be destroyed and replaced by the table created during this pipeline run.
- Truncate and Insert: Each time the pipeline runs, two operations are performed: first, the table is truncated, meaning all existing rows are deleted. Then, your new rows are inserted. The table itself is never destroyed and recreated.
- Fail if Exists: If the specified table name already exists, this pipeline will fail to run.
- Append: If the specified table name already exists, then the data is inserted without altering or deleting the existing data in the table. It’s appended onto the end of the existing data in the table. If the specified table name doesn’t exist, then the table will be created, and your data will be inserted into the table.
In the tree structure representing the source data, select the checkboxes next to the fields that you want to output from this connector, then click Save.Unselected fields will not be included in the output from this connector.For each field you select, you can click the Edit Alias icon to change the name used for this field in all downstream components in this pipeline.
Select one or more columns to be designated as the table’s primary key.
- Yes: Staged files will be destroyed after data is loaded. This is the default setting.
- No: Staged files are retained in the staging area after data is loaded.
Select the time period to use to partition the data loaded into your table, for example Day or Month.
Select the Google Cloud Storage (GCS) bucket linked to your Google Cloud Platform account.
Select whether to overwrite files of the same name when this pipeline runs. Default is Yes.
- Append Files in Folder: Appends files to storage folder. This is the default setting.
- Overwrite Files in Folder: Overwrite existing files with matching structure.
The folder path for the files to be written to. Note that this path follows, but does not include, the bucket or container name.
A string of characters that precedes the name of the written files. This can be useful for organizing database objects.
A cloud storage location to load your data into files for storage. Choose either Amazon S3, Azure Storage, or Google Cloud Storage.
Advanced Settings
Set the severity level of logging. Choose from Error, Warn, Info, Debug, or Trace. Logs can be found in the Message field of the task details after the pipeline has been run.
Choose whether to return the entire payload or only selected data objects. Read Structure to learn how to select which data objects to include in your API response.
- No: Will return the entire payload. This is the default setting.
- Yes: Will return only the objects in Custom Connector that are marked as Selected Data in the Structure setting.
Deactivate soft delete for Azure blobs (Databricks)
If you intend to set your destination as Databricks and your stage platform as Azure Storage, you must turn off the “Enable soft delete for blobs” setting in your Azure account for your pipeline to run successfully. To do this:
- In the Azure portal, navigate to your storage account.
- In the menu, under Data management, click Data protection.
- Clear the Enable soft delete for blobs checkbox. For more information, read Soft delete for blobs.