Box - Maia Documentation

This page describes how to configure the Box connector component as part of a data pipeline within . The Box component uses the Connect and Configure parameters to create a table of Box data, which is then stored in your preferred storage location (Snowflake, Databricks, Amazon Redshift, or cloud storage). You do not need to use the Create Table component when using this connector, as the Box component will create a new table or replace an existing table for you using the Destination parameters you define. The Box connector is a Flex connector. In , Flex connectors let you connect to a curated set of endpoints to load data. You can use the Box connector in its preconfigured state, or you can edit the connector by adding or amending available Box endpoints as per your use case. You can edit Flex connectors in the Custom Connector user interface. For detailed information about authentication, specific endpoint parameters, pagination, and more aspects of the Box API, read the Box API documentation.

Properties

Reference material is provided below for the Connect, Configure, Destination, and Advanced Settings properties.

Name

string

required

A human-readable name for the component.

Connect

Data Source

drop-down

required

The data source to load data from in this pipeline. The drop-down menu lists the Box API endpoints available in the connector. For detailed information about specific endpoints, read the Box API documentation.

Endpoint	Method	Reference
Get Collaboration	GET	Retrieve a single collaboration
List File Collaborations	GET	Retrieve a list of pending and active collaborations for a file. This returns all the users that have access to the file or have been invited to the file
List Folder Collaborations	GET	Retrieve a list of pending and active collaborations for a folder. This returns all the users that have access to the folder or have been invited to the folder
Get Comment	GET	Retrieve the message and metadata for a specific comment, as well as information on the user who created the comment
List File Comments	GET	Retrieve a list of comments for a file
List Groups For Enterprise	GET	Retrieve all of the groups for a given enterprise. The user must have admin permissions to inspect enterprise’s groups
List Items In Folder	GET	Retrieve a page of items in a folder. These items can be files, folders, and web links
Get Group Membership	GET	Retrieve a specific group membership. Only admins of this group or users with admin-level permissions will be able to use this API
List User’s Groups	GET	Retrieve all the groups for a user. Only members of this group or users with admin-level permissions will be able to use this API
Find Folder For Shared Link	GET	Return the folder represented by a shared link. A shared folder can be represented by a shared link, which can originate within the current enterprise or within another
Find Password Protected Folder For Shared Link	GET	Return the folder represented by a shared link
List Trashed Items	GET	Retrieve the files and folders that have been moved to the trash
List User and Enterprise Events	GET	Return up to a year of past events for a given user or for the entire enterprise
List Enterprise Users	GET	Return a list of all users for the Enterprise along with their `user_id`, `public_name`, and `login`

The Find Folder For Shared Link and Find Password Protected Folder For Shared Link endpoints need different boxapi header parameter values based on whether the folder is password-protected.

Authentication Type

drop-down

required

The authentication method to authorize access to your Box data. Currently supports OAuth 2.0 Client Credentials.

Authentication

string

required

Select your authentication profile.To create a new profile, read OAuth client credentials.As part of the authentication process in , you’ll need to enter the following in the fields provided:

Configure

URI Parameters

column editor

required

Parameter Name: The name of a URI parameter.
Parameter Value: The value of the corresponding parameter.

Required parameter	Endpoints	Description
collaboration_id	Get Collaboration	The ID of the collaboration.
file_id	List File Collaborations, List Folder Collaborations, List File Comments	The unique identifier that represents a file. The ID for any file can be determined by visiting a file in the web application and copying the ID from the URL. For example, for the URL `https://*.app.box.com/files/123` the `file_id` is `123`.
comment_id	Get Comment	The ID of the comment.
folder_id	List Items In Folder	The unique identifier that represents a folder. The ID for any folder can be determined by visiting this folder in the web application and copying the ID from the URL. For example, for the URL `https://*.app.box.com/folder/123` the `folder_id` is `123`.
group_membership_id	Get Group Membership	The ID of the group membership.
user_id	List User’s Groups	The ID of the user.

Query Parameters

column editor

required

Parameter Name: The name of a query parameter.
Parameter Value: The value of the corresponding parameter.

Required parameter	Endpoints	Description
usemarker	List File Collaborations	Enter `true`.
fields	List Items In Folder, List Trashed Items, List Enterprise Users	Enter `id,name,etag,sequence_id,description,size,created_at,modified_at,trashed_at,purged_at,content_created_at,content_modified_at,item_status,folder_upload_email,created_by,shared_link,modified_by,owned_by,parent`.

Header Parameters

column editor

required

Parameter Name: The name of a header parameter.
Parameter Value: The value of the corresponding parameter.

Required parameter	Endpoints	Description
Content-Type	All endpoints	Enter `application/json`.
boxapi	Find Folder For Shared Link, Find Password Protected Folder For Shared Link	For `Find Folder For Shared Link` enter `shared_link=link`. For `Find Password Protected Folder For Shared Link` enter `shared_link=link&shared_link_password=password`.

Post Body

JSON

A JSON body to include as part of a POST request. Use Custom Connector to test your endpoints work as expected before moving to Designer pipelines.You should also consult the developer documentation for the API you’re connecting to—as the developer portal may provide additional information about endpoints and requests.

Page Limit

integer

A numeric value to limit the maximum number of records per page.

Destination

Select your cloud data warehouse.

Snowflake
Databricks
Amazon Redshift

Destination

drop-down

required

Snowflake: Load your data into Snowflake. You’ll need to set a cloud storage location for temporary staging of the data.
Cloud Storage: Load your data directly into your preferred cloud storage location.

Click either the Snowflake or Cloud Storage tab on this page for documentation applicable to that destination type.

Snowflake
Cloud Storage

Warehouse

drop-down

required

The Snowflake warehouse used to run the queries. The special value [Environment Default] uses the warehouse defined in the environment. Read Overview of Warehouses to learn more.

Database

drop-down

required

The Snowflake database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.

Schema

drop-down

required

The Snowflake schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.

Table Name

string

required

The name of the table to be created.

Load Strategy

drop-down

required

Replace: If the specified table name already exists, that table will be destroyed and replaced by the table created during this pipeline run.
Truncate and Insert: If the specified table name already exists, all rows within the table will be removed and new rows will be inserted per the next run of this pipeline.
Fail if Exists: If the specified table name already exists, this pipeline will fail to run.
Append: If the specified table name already exists, then the data is inserted without altering or deleting the existing data in the table. It’s appended onto the end of the existing data in the table. If the specified table name doesn’t exist, then the table will be created, and your data will be inserted into the table.

Clean Staged files

boolean

required

Yes: Staged files will be destroyed after data is loaded. This is the default setting.
No: Staged files are retained in the staging area after data is loaded.

Stage Access Strategy

drop-down

Select the stage access strategy. The strategies available depend on the cloud platform you select in Stage Platform.

Credentials: Connects to the external stage (AWS, Azure) using your configured cloud provider credentials. Not available for Google Cloud Storage.
Storage Integration: Use a Snowflake storage integration to grant access to Snowflake to read data from and write to a cloud storage location. This will reveal the Storage Integration property, through which you can select any of your existing Snowflake storage integrations.

Stage Platform

drop-down

required

Choose a data staging platform using the drop-down menu.

Amazon S3: Stage your data on an AWS S3 bucket.
Snowflake: Stage your data on a Snowflake internal stage.
Azure Storage: Stage your data in an Azure Blob Storage container.
Google Cloud Storage: Stage your data in a Google Cloud Storage bucket.

Click one of the tabs below for documentation applicable to that staging platform.

Amazon S3
Snowflake
Azure Storage
Google Cloud Storage

Storage Integration

drop-down

required

Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them. Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage, regardless of the cloud provider that hosts your Snowflake account.

Amazon S3 Bucket

drop-down

required

An AWS S3 bucket to stage data into. The drop-down menu will include buckets tied to the cloud provider credentials that you have associated with your environment.

Load Strategy

drop-down

Append Files in Folder: Appends files to storage folder. This is the default setting.
Overwrite Files in Folder: Overwrite existing files with matching structure.

See the configuration table for how this parameter works with the Folder Path and File Prefix parameters:

Configuration	Description
Append files in folder with defined folder path and file prefix.	Files will be stored under the structure `uniqueID/timestamp-partX` where X is the part number, starting from 1.
Append files in folder without defined folder path and file prefix.	Files will be stored under the structure `folder/prefix-timestamp-partX` where X is the part number, starting from 1.
Overwrite files in folder with defined folder path and file prefix.	Files will be stored under the structure `folder/prefix-partX` where X is the part number, starting from 1. All files with matching structures will be overwritten.
Overwrite files in folder without defined folder path and file prefix.	Validation will fail. Folder path and file prefix must be supplied for this load strategy.

Folder Path

string

The folder path of the written files.

File Prefix

string

A string of characters to include at the beginning of the written files. Often used for organizing database objects.

Storage

drop-down

required

A cloud storage location to load your data into for storage. Choose either Amazon S3, Azure Storage, or Google Cloud Storage.Click the tab that corresponds to your chosen cloud storage service.

Amazon S3
Azure Storage
Google Cloud Storage

Amazon S3 Bucket

drop-down

required

An AWS S3 bucket to load data into. The drop-down menu will include buckets tied to the cloud provider credentials that you have associated with your environment.

Destination

drop-down

required

Databricks: Load your data into Databricks. You’ll need to set a cloud storage location for temporary staging of the data.
Cloud Storage: Load your data directly into your preferred cloud storage location.

Click either the Databricks or Cloud Storage tab on this page for documentation applicable to that destination type.

Databricks
Cloud Storage

Catalog

drop-down

required

Select a Databricks Unity Catalog. The special value [Environment Default] uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter.

Schema

drop-down

required

The Databricks schema. The special value [Environment Default] uses the schema defined in the environment. Read Create and manage schemas to learn more.

Table Name

string

required

The name of the table to be created.

Load Strategy

drop-down

required

Fail if Exists: If the specified table name already exists, this pipeline will fail to run.
Replace: If the specified table name already exists, that table will be destroyed and replaced by the table created during this pipeline run.
Truncate and Insert: If the specified table name already exists, all rows within the table will be removed and new rows will be inserted per the next run of this pipeline.
Append: If the specified table name already exists, then the data is inserted without altering or deleting the existing data in the table. It’s appended onto the end of the existing data in the table. If the specified table name doesn’t exist, then the table will be created, and your data will be inserted into the table.

Clean Staged Files

boolean

required

Yes: Staged files will be destroyed after data is loaded. This is the default setting.
No: Staged files are retained in the staging area after data is loaded.

Stage Platform

drop-down

required

Choose a data staging platform using the drop-down menu.

Amazon S3: Stage your data on an AWS S3 bucket.
Azure Storage: Stage your data in an Azure Blob Storage container.

Click one of the tabs below for documentation applicable to that staging platform.

Amazon S3
Azure Storage

Amazon S3 Bucket

drop-down

required

An AWS S3 bucket to stage data into. The drop-down menu will include buckets tied to the cloud provider credentials that you have associated with your environment.

Load Strategy

drop-down

Append Files in Folder: Appends files to storage folder. This is the default setting.
Overwrite Files in Folder: Overwrite existing files with matching structure.

See the configuration table for how this parameter works with the Folder Path and File Prefix parameters:

Configuration	Description
Append files in folder with defined folder path and file prefix.	Files will be stored under the structure `uniqueID/timestamp-partX` where X is the part number, starting from 1.
Append files in folder without defined folder path and file prefix.	Files will be stored under the structure `folder/prefix-timestamp-partX` where X is the part number, starting from 1.
Overwrite files in folder with defined folder path and file prefix.	Files will be stored under the structure `folder/prefix-partX` where X is the part number, starting from 1. All files with matching structures will be overwritten.
Overwrite files in folder without defined folder path and file prefix.	Validation will fail. Folder path and file prefix must be supplied for this load strategy.

Folder Path

string

The folder path of the written files.

File Prefix

string

A string of characters to include at the beginning of the written files. Often used for organizing database objects.

Storage

drop-down

required

A cloud storage location to load your data into for storage. Choose either Amazon S3, Azure Storage, or Google Cloud Storage.Click the tab that corresponds to your chosen cloud storage service.

Amazon S3
Azure Storage
Google Cloud Storage

Amazon S3 Bucket

drop-down

required

An AWS S3 bucket to load data into. The drop-down menu will include buckets tied to the cloud provider credentials that you have associated with your environment.

Destination

drop-down

required

Redshift: Load your data into Amazon Redshift. You’ll need to set a cloud storage location for temporary staging of the data.
Cloud Storage: Load your data directly into your preferred cloud storage location.

Click either the Amazon Redshift or Cloud Storage tab on this page for documentation applicable to that destination type.

Amazon Redshift
Cloud Storage

Schema

drop-down

required

The Amazon Redshift schema. The special value [Environment Default] uses the schema defined in the environment. Read Schemas to learn more.For more information on using multiple schemas, read Schemas.

Table Name

string

required

The name of the table to be created.

Load Strategy

drop-down

required

Replace: If the specified table name already exists, that table will be destroyed and replaced by the table created during this pipeline run.
Fail if Exists: If the specified table name already exists, this pipeline will fail to run.
Truncate and Insert: If the specified table name already exists, all rows within the table will be removed and new rows will be inserted per the next run of this pipeline.
Append: If the specified table name already exists, then the data is inserted without altering or deleting the existing data in the table. It’s appended onto the end of the existing data in the table. If the specified table name doesn’t exist, then the table will be created, and your data will be inserted into the table.

Clean Staged Files

boolean

required

Yes: Staged files will be destroyed after data is loaded. This is the default setting.
No: Staged files are retained in the staging area after data is loaded.

Amazon S3 Bucket

drop-down

required

An AWS S3 bucket to stage data into. The drop-down menu will include buckets tied to the cloud provider credentials that you have associated with your environment.

Load Strategy

drop-down

Append Files in Folder: Appends files to storage folder. This is the default setting.
Overwrite Files in Folder: Overwrite existing files with matching structure.

See the configuration table for how this parameter works with the Folder Path and File Prefix parameters:

Configuration	Description
Append files in folder with defined folder path and file prefix.	Files will be stored under the structure `uniqueID/timestamp-partX` where X is the part number, starting from 1.
Append files in folder without defined folder path and file prefix.	Files will be stored under the structure `folder/prefix-timestamp-partX` where X is the part number, starting from 1.
Overwrite files in folder with defined folder path and file prefix.	Files will be stored under the structure `folder/prefix-partX` where X is the part number, starting from 1. All files with matching structures will be overwritten.
Overwrite files in folder without defined folder path and file prefix.	Validation will fail. Folder path and file prefix must be supplied for this load strategy.

Folder Path

string

The folder path of the written files.

File Prefix

string

A string of characters to include at the beginning of the written files. Often used for organizing database objects.

Storage

drop-down

required

A cloud storage location to load your data into for storage. Choose either Amazon S3, Azure Storage, or Google Cloud Storage.Click the tab that corresponds to your chosen cloud storage service.

Amazon S3
Azure Storage
Google Cloud Storage

Amazon S3 Bucket

drop-down

required

An AWS S3 bucket to load data into. The drop-down menu will include buckets tied to the cloud provider credentials that you have associated with your environment.

Advanced Settings

Log Level

drop-down

Set the severity level of logging. Choose from Error, Warn, Info, Debug, or Trace. Logs can be found in the Message field of the task details after the pipeline has been run.

Load Selected Data

boolean

Choose whether to return the entire payload or only selected data objects. Read Structure to learn how to select which data objects to include in your API response.

No: Will return the entire payload. This is the default setting.
Yes: Will return only the objects in Custom Connector that are marked as Selected Data in the Structure setting.

Deactivate soft delete for Azure blobs (Databricks)

If you intend to set your destination as Databricks and your stage platform as Azure Storage, you must turn off the “Enable soft delete for blobs” setting in your Azure account for your pipeline to run successfully. To do this:

In the Azure portal, navigate to your storage account.
In the menu, under Data management, click Data protection.
Clear the Enable soft delete for blobs checkbox. For more information, read Soft delete for blobs.

Documentation Index

​Properties

​Connect

​Configure

​Destination

​Advanced Settings

​Deactivate soft delete for Azure blobs (Databricks)

Properties

Connect

Configure

Destination

Advanced Settings

Deactivate soft delete for Azure blobs (Databricks)