S3 Unload - Maia Documentation

S3 Unload is an orchestration component that creates files on a specified S3 bucket and populates those files by copying data from a designated table or view. To access an S3 bucket from a different AWS account, read Background: Cross-account permissions and using IAM roles.

We have identified issues when using the S3 Unload component for AWS Databricks with Serverless SQL warehouses or Classic SQL warehouses. Our team is actively investigating these issues to improve the component’s functionality. In the meantime, we recommend using an All-purpose compute as a temporary workaround.

If the component requires access to a cloud provider (AWS, Azure, or GCP), it will use credentials as follows:

If using Matillion Full SaaS: The component will use the cloud credentials associated with your environment to access resources.
If using Hybrid SaaS: By default the component will inherit the agent’s execution role (service account role). However, if there are cloud credentials associated to your environment, these will overwrite the role.

If you’re using a Matillion Full SaaS solution, you may need to allow these IP address ranges from which Full SaaS s will call out to their source systems or to cloud data platforms.

Properties

Name

string

required

A human-readable name for the component.

Snowflake
Databricks
Amazon Redshift

Stage

drop-down

required

Select a predefined stage for your data. These stages must be created from your Snowflake account console. Otherwise, “Custom” can be chosen for the staging to be based on the component’s Storage Integration and S3 Object Prefix parameters.

Authentication

drop-down

required

Select the authentication method. You can select either:

Credentials: Uses AWS security credentials.
Storage Integration: Use a Snowflake storage integration. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of permitted or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage). For more information, read CREATE STORAGE INTEGRATION.

Storage Integration

drop-down

required

Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them. Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage, regardless of the cloud provider that hosts your Snowflake account.

S3 Object Prefix

file explorer

required

To retrieve the intended files, use the file explorer to enter the container path where the S3 bucket is located, or select from the list of S3 buckets.This must have the format S3://<bucket>/<path>.

File Prefix

string

required

Filename prefix for unloaded data to be named on the S3 bucket. Each file will be named as the prefix followed by a number denoting which node this was unloaded from. All unloads are parallel and will use the maximum number of nodes available at the time.

Encryption

drop-down

required

Decide how the files are encrypted inside the S3 bucket. This property is available when using an existing Amazon S3 location for staging.

None: No encryption.
Client Side Encryption: Encrypt the data according to a client-side master key. Read Protecting data using client-side encryption to learn more.
SSE KMS: Encrypt the data according to a key stored on KMS. Read AWS Key Management Service (AWS KMS) to learn more.
SSE S3: Encrypt the data according to a key stored on an S3 bucket. Read Using server-side encryption with Amazon S3-managed encryption keys (SSE-S3) to learn more.

KMS Key ID

drop-down

required

The ID of the KMS encryption key you have chosen to use in the Encryption property.Only available when encryption is set to KMS Encryption.

Master Key

drop-down

required

The secret definition denoting your master key for client-side encryption. Your password should be saved as a secret definition before using this component.Only available when encryption is set to Client Side Encryption.

Warehouse

drop-down

required

The Snowflake warehouse used to run the queries. The special value [Environment Default] uses the warehouse defined in the environment. Read Overview of Warehouses to learn more.

Database

drop-down

required

The Snowflake database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.

Schema

drop-down

required

The Snowflake schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.

Target Table

drop-down

required

Select an existing table. The tables available for selection depend on the chosen schema.

Format

drop-down

required

Select a pre-made file format that will automatically set many of the S3 Load component properties. These formats can be created through the Create File Format component.

File Type

drop-down

required

Select the following file type: CSV, JSON, or Parquet.Some file types may require additional formatting—this is explained in the Snowflake documentation. Component properties will change to reflect the selected file type.

Compression

drop-down

required

Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO.

Nest Columns

drop-down

required

JSON only. When True, the table columns should be nested into a single JSON object so that the file can be configured correctly. A table with a single variant column will not require this setting to be True. Default is False.

Record Delimiter

string

CSV only. Input a delimiter for records. This can be one or more single-byte or multi-byte characters that separate records in an input file.Accepted values include: leaving the field empty; a newline character \ or its hex equivalent 0x0a; a carriage return \\r or its hex equivalent 0x0d. Also accepts a value of NONE.If you set the Skip Header to a value such as 1, then you should use a record delimiter that includes a line feed or carriage return, such as \ or \\r. Otherwise, your entire file will be interpreted as the header row, and no data will be loaded.The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.The default (if the field is left blank) is a newline character.

Field Delimiter

string

CSV only. Input a delimiter for fields. This can be one or more single-byte or multibyte characters that separate fields in an input file.Accepted characters include common escape sequences, octal values (prefixed by \), or hex values (prefixed by 0x). Also accepts a value of NONE.This delimiter is limited to a maximum of 20 characters.While multi-character delimiters are supported, the field delimiter cannot be a substring of the record delimiter, and vice versa. For example, if the field delimiter is “aa”, the record delimiter cannot be “aabb”.The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.Do not specify characters used for other file type options such as Escape or Escape Unenclosed Field.The default setting is a comma: ,.

Date Format

string

CSV only. Define the format of date values in the data files to be loaded. If a value is not specified or is AUTO, the value for the DATE_INPUT_FORMAT session parameter is used. The default setting is AUTO.

Time Format

string

CSV only. Define the format of time values in the data files to be loaded. If a value is not specified or is AUTO, the value for the TIME_INPUT_FORMAT session parameter is used. The default setting is AUTO.

Timestamp Format

string

CSV only. Define the format of timestamp values in the data files to be loaded. If a value is not specified or is AUTO, the value for the TIMESTAMP_INPUT_FORMAT session parameter is used.

Escape

string

CSV only. Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.

Escape Unenclosed Field

string

CSV only. Specify a single character to be used as the escape character for unenclosed field values only. Default is \\. If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case, it will be ignored.

Field Optionally Enclosed

string

CSV only. Specify a character used to enclose strings. The value can be NONE, single quote character ', or double quote character ". To use the single quote character, use the octal or hex representation 0x27 or the double single-quoted escape ''. Default is NONE.When a field contains one of these characters, escape the field using the same character. For example, to escape a string like this: 1 “2” 3, use double quotation to escape, like this: 1 ""2"" 3.

Null If

editor

Use the Null If property to specify a plain string that represents NULL values during an S3 unload. The property does not require a strict format—common examples include NULL, N/A, or \N, depending on the null markers used in your data. When the field is empty, empty strings are treated as null values. When your null string includes special characters such as quotation marks, escape them by doubling the character (for example, represent 1 "2" 3 as 1 ""2"" 3).If you are working with delimited file types, consider setting the Escape property to a single character (for example, \\) to escape special characters in field values, which helps preserve data integrity.

Trim Space

boolean

required

When True, removes whitespace from fields. Default setting is False.

Overwrite

drop-down

required

If the target file already exists, overwrite data instead of generating an error.

Single File

boolean

required

When True, the unload will work in serial rather than parallel. This results in a slower unload but a single, complete file. The default setting is False.When True, no file extension is used in the output filename (regardless of the file type, and regardless of whether or not the file is compressed). When False, a filename prefix must be included in the path.When True, the Max File Size property isn’t applicable.

Max File Size

integer

The maximum size (in bytes) of each file generated, per thread. Default is 16000000 bytes (16 MB) and Snowflake has a 6.2 GB file limit for copy-into-location operations. Files that exceed the stated maximum will be split into multiple size-abiding parts.

Include Headers

boolean

required

When True, write column names as headers at the top of the unloaded files.

Catalog

drop-down

required

Select a Databricks Unity Catalog. The special value [Environment Default] uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter.

Schema (Database)

drop-down

required

The Databricks schema. The special value [Environment Default] uses the schema defined in the environment. Read Create and manage schemas to learn more.

Table Name

drop-down

required

The table to unload to S3.

S3 URL Location

URL

required

The folder path in a private or public S3 bucket where the data will be loaded.

Encryption

drop-down

Decide how the files are encrypted inside the S3 bucket.

Client Side Encryption: Encrypt the data according to a client-side master key. For more information, read Protecting data using client-side encryption.
KMS Encryption: Encrypt the data according to a key stored on KMS. For more information, read AWS Key Management Service (AWS KMS).
S3 Encryption: Encrypt the data according to a key stored on an S3 bucket. For more information, read Using server-side encryption with Amazon S3-managed encryption keys (SSE-S3).

KMS Key Id

drop-down

required

Select the KMS Key Id you want to use after selecting KMS Encryption from the Encryption property drop-down.

Master Key

drop-down

required

Select the Master Key you want to use after selecting Client Side Encryption from the Encryption property drop-down.

File Type

drop-down

required

Select the file type. Available types include AVRO, CSV, JSON, and PARQUET. The following properties will change to reflect the selected file type.

Compression

drop-down

Set the compression type. The default is NONE.This setting doesn’t apply to AVRO.

Record Delimiter

string

(CSV only) A delimiter character used in delimited text formats. The default is a comma.

Quote

string

(CSV only) Sets the quote character for delimited text formats to enclose values containing the delimiter character. The default is ".

Date Format

string

(CSV and JSON only) Manually set a date format. If none is set, the default is yyyy-MM-dd.

Timestamp Format

string

(CSV and JSON only) Manually set a timestamp format. If none is set, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX].

Escape

string

(CSV only) Sets the escape character used in delimited text formats to escape special characters. The default is \.

Null Value

string

(CSV only) Sets the string representation of a null value in the input data. The default value is an empty string.

Header

boolean

required

(CSV only) Specify whether the input data has a header row. The default is “False”.

Escape Quotes

boolean

required

(CSV only) Specify whether to escape quote characters. The default is “True”.

Write Mode

drop-down

required

When unloading data to an S3 bucket, choose from the following options:

APPEND: Add the new file to the existing directory.
OVERWRITE: Completely clear the directory before adding a new file.

Schema

drop-down

required

Select the table schema. The special value [Environment Default] uses the schema defined in the environment. For more information on using multiple schemas, read Schemas.

Table Name

string

required

The table or view to unload to S3.

S3 URL Location

string

required

The URL of the S3 bucket to unload the data into.

This component can unload to any accessible bucket, regardless of region. When you enter a forward slash character / after a folder name, the file path is validated.

S3 Object Prefix

string

required

Create data files in S3 beginning with this prefix. The format of the output is

<prefix><slice-number>_part_<file-number>

Where prefix is the string you’ve entered here, slice-number is the number of the slice in your cluster, and file-number denotes the file number range. For example, if a slice has 50 MB of data, and you’ve chosen a maximum file size of 10 MB, then the file numbers will range from 001 -> 005.

Encryption

drop-down

Decide how the files are encrypted inside the S3 bucket. This property is available when using an existing Amazon S3 location for staging.

SSE S3: Encrypt the data according to a key stored on an S3 bucket. Read Using server-side encryption with Amazon S3-managed encryption keys (SSE-S3) to learn more.
SSE KMS: Encrypt the data according to a key stored on KMS. Read AWS Key Management Service (AWS KMS) to learn more.

KMS Key ID

drop-down

The ID of the KMS encryption key you have chosen to use in the Encryption property.

Manifest

drop-down

Whether or not to generate a manifest file detailing the files that were added.

Selecting the option Yes (Verbose) will create a manifest file that explicitly lists details for the data files created by the Unload process. For more information, read the Redshift documentation.

Data File Type

drop-down

required

Select the following file type: CSV, Delimited, Fixed Width, or Parquet. Component properties will change to reflect the selected file type.

Delimiter

string

required

The delimiter that separates columns. The default is a comma. A [TAB] character can be specified as ”\ ”.This property is available when Data File Type is set to Delimited.

Fixed Width Spec

string

required

Loads the data from a file where each column width is a fixed length, rather than separated by a delimiter. Each column is described by a name and length, separated by a colon. Each described column is then separated by a comma.For example, we have four columns: name, id, age, state. These columns have the respective lengths: 12,8,2,2.The written description to convert this data into a table using fixed-width columns would then be:

name:12,id:8,age:2,state:2_

Note that the columns can have any plaintext name. For more information on fixed width inputs, read the AWS documentation.This property is available when Data File Type is set to Fixed Width.

Compress Data

drop-down

required

Whether or not the resultant files are to be compressed.This property is available when Data File Type is set to CSV, Delimited, or Fixed Width.

Compression Type

drop-down

required

If Compress Data is set to Yes, select either GZIP or BZIP2 as the compression method.

NULL As

string

This option replaces the specified string with null in the output table. Use this if your data has a particular representation of missing data.Use the NULL As property to specify a plain string that represents NULL values during an S3 unload. The property does not require a strict format—common examples include NULL, N/A, or \N, depending on the null markers used in your data. When your null string includes special characters such as quotation marks, escape them by doubling the character (for example, represent 1 "2" 3 as 1 ""2"" 3). Do not use \n as a null string, because it is reserved as a line delimiter and may produce unexpected results.If you set Data File Type to Delimited, consider enabling the Escape property. This option prefixes special characters with a backslash, which helps preserve data integrity and ensures compatibility if you reload the data later using the COPY command.This property is available when Data File Type is set to CSV, Delimited, or Fixed Width.

Escape

drop-down

required

Whether or not to insert backslashes to escape special characters. This is often a good idea if you intend to reload the data back into a table later, since the COPY also supports this option.This property is available when Data File Type is set to Delimited.

Allow Overwrites

drop-down

If the target file already exists, overwrite data instead of generating an error.

Parallel

drop-down

If set to Yes, the unload will work in parallel, creating multiple files (one for each slice of the cluster). Disabling parallel will result in a slower unload but a single, complete file.

Files are only split if parallel is set to No, and they are split based on what you specify in the Max File Size parameter.

Add Quotes

drop-down

required

If set, quotation marks are added to the data.This property is available when Data File Type is set to Delimited.

IAM Role ARN

drop-down

Select an IAM role Amazon Resource Name (ARN) that is already attached to your Redshift cluster, and that has the necessary permissions to access S3.This setting is optional, since without this style of setup, the credentials of the environment (instance credentials or manually entered access keys) will be used.Read the Redshift documentation for more information about using a role ARN with Redshift.

Max File Size

string

The maximum size (in MB) of each file generated, per thread. Default is 16 MB and AWS has a 6.2GB file limit for Unload operations. Files that exceed the stated maximum will be split into multiple size-abiding parts.

Include Header

drop-down

If set to Yes, will write column names as headers at the top of unloaded files.This property is available when Data File Type is set to CSV or Delimited.

S3 Bucket Region

drop-down

The Amazon S3 region hosting the S3 bucket. This is not normally required and can be left as “None” (default) if the bucket is in the same region as your Redshift cluster.

Documentation Index

​Properties

Properties