Azure Blob Storage Unload - Maia Documentation

Azure Blob Storage Unload is an orchestration component that creates files on a specified Azure Blob Storage account and populates those files by copying data from a designated table or view. By default, your data will be unloaded in parallel.

We have identified issues when using the Azure Blob Storage Unload component for Databricks with Serverless SQL warehouses or Classic SQL warehouses. Our team is actively investigating these issues to improve the component’s functionality. In the meantime, we recommend using an All-purpose compute as a temporary workaround.

If the component requires access to a cloud provider (AWS, Azure, or Google Cloud), it will use credentials as follows:

If using Matillion Full SaaS: The component will use the cloud credentials associated with your environment to access resources.
If using Hybrid SaaS: By default the component will inherit the agent’s execution role (service account role). However, if there are cloud credentials associated to your environment, these will overwrite the role.

Properties

Snowflake
Databricks

Name

string

required

A human-readable name for the component.

Stage

drop-down

required

Select a staging area for the data. Staging areas can be created through Snowflake using the CREATE STAGE command. Internal stages can be set up this way to store staged data within Snowflake. Selecting [Custom] will avail the user of properties to specify a custom staging area on Azure Blob Storage. Users can add a fully qualified stage by typing the stage name. This should follow the format databaseName.schemaName.stageName.

Azure Storage Location

file explorer

required

To retrieve the intended files, use the file explorer to enter the container path where the Azure storage account is located, or select from the list of storage accounts.This must have the format AZURE://<StorageAccount>/<path>.If the specified container path doesn’t exist, Azure Blob Storage Unload creates it automatically. If the source table or query returns no rows, no files are written to Azure Blob Storage.

File Prefix

string

required

Specify a file prefix for unloaded data on the blob container. Each file will be named as the prefix followed by a number denoting which node this was unloaded from. All unloads are parallel, and will use the maximum number of nodes available at the time.

Authentication

drop-down

required

Select the authentication method. Users can choose either:

Credentials: Uses Azure security credentials.
Storage Integration: Use a Snowflake storage integration. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of permitted or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage). More information can be found at CREATE STORAGE INTEGRATION.

Warehouse

drop-down

required

The Snowflake warehouse used to run the queries. The special value [Environment Default] uses the warehouse defined in the environment. Read Overview of Warehouses to learn more.

Database

drop-down

required

The Snowflake database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.

Schema

drop-down

required

The Snowflake schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.

Target Table

drop-down

required

Select an existing table. The tables available for selection depend on the chosen schema.

Format

drop-down

required

Select a pre-made file format that will automatically set many of the Azure unload component properties. These formats can be created through the Create File Format component. Selecting the [Custom] file format will use the component properties to define the file format.

File Type

drop-down

required

The unload file format. Choose from CSV, JSON, or PARQUET. Some file types may require additional formatting—this is explained in the Snowflake documentation. Component properties will change to reflect the selected file type.

Compression

drop-down

required

Select the compression format. Available CSV and JSON formats include:

BROTLI
BZ2
DEFLATE
gzip
NONE (no compression)
RAW_DEFLATE
ZSTD

Available PARQUET formats include:

AUTO
LZO
NONE (no compression)
SNAPPY

CSV
JSON

Record Delimiter

string

required

Specify a delimiter character to separate records (rows) in the file. Defaults to newline. \ can also signify a newline. \\r can signify a carriage return.

Field Delimiter

string

required

Specify a delimiter character to separate columns. The default character is a comma ,.A [TAB] character can be specified as \.

Date Format

string

required

Defaults to auto. Use this property to manually specify a date format. For more information, read the Snowflake documentation.

Time Format

string

required

Defaults to auto. Use this property to manually specify a time format. For more information, read the Snowflake documentation.

Timestamp Format

string

required

Defaults to auto. Use this property to manually specify a timestamp format. For more information, read the Snowflake documentation.

Escape

string

required

Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE.

Escape Unenclosed Field

string

required

Specify a single character to be used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). Default is \.If a character is specified in the “Escape” field, it will override this field.If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case it will be ignored.

Field Optionally Enclosed

string

required

A character that is used to enclose strings. Can be single quote (’) or double quote (”) or NONE (default). Note that the character chosen can be escaped by that same character.

Null If

string

required

Specify one or more strings (one string per row of the table) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with a SQL NULL value for that field in the loaded table. Click + to add a string.

Trim Space

drop-down

required

(JSON and PARQUET only) Removes trailing and leading whitespace from the input data.

Overwrite

drop-down

required

When “True”, overwrite existing data (if the target file already exists) instead of generating an error. Default setting is “False”.

Single File

drop-down

required

When True, the unload operation will work in serial rather than parallel. This results in a slower unload but a single, complete file.The default setting is False.When True, no file extension is used in the output filename (regardless of the file type, and regardless of whether the file is compressed).When False, a filename prefix must be included in the path.

Max File Size

integer

required

The maximum size (in bytes) of each file generated.The default is 16000000 (16 MB). The maximum size is 5000000000 (5 GB).For more information, see the Snowflake documentation.

Include Headers

drop-down

required

When “True”, write column names as headers at the top of the unloaded files. Default is “False”.

Name

string

required

A human-readable name for the component.

Azure Storage Location

URL

required

Use the file explorer to enter the container path where the Azure storage account is located, or select from the list of storage accounts.If the specified container path doesn’t exist, Azure Blob Storage Unload creates it automatically. If the source table or query returns no rows, no files are written to Azure Blob Storage. However, the target directory is still created.

Catalog

drop-down

required

Select a Databricks Unity Catalog. The special value [Environment Default] uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter.

Schema (Database)

drop-down

required

The Databricks schema. The special value [Environment Default] uses the schema defined in the environment. Read Create and manage schemas to learn more.

Table Name

drop-down

required

The table to unload to Azure Blob Storage.

File Type

select

required

Select the file type. Available types include AVRO, CSV, JSON, and PARQUET. Below properties will change to reflect the selected file type.

Compression

drop-down

Select the compression format. The default is set to empty. This setting doesn’t apply to AVRO.Available CSV, JSON, and PARQUET formats include:

GZIP
BZIP2
DEFLATE
LZ4
SNAPPY

CSV
JSON

Record Delimiter

string

A delimiter character used in delimited text formats. The default is a comma.

Quote

string

Sets the quote character for delimited text formats to enclose values containing the delimiter character. The default is ".

Date Format

string

Manually set a date format. If none is set, the default is yyyy-MM-dd.

Timestamp Format

string

Manually set a timestamp format. If none is set, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX].

Escape

string

Sets the escape character used in delimited text formats to escape special characters. The default is \.

Null Value

string

Sets the string representation of a null value in the input data. The default value is an empty string.

Header

boolean

required

Specify whether the input data has a header row. Default is False.

Escape Quotes

boolean

required

Specify whether to escape quote characters. The default is True.

Date Format

string

Manually set a date format. If none is set, the default is yyyy-MM-dd.

Timestamp Format

string

Manually set a timestamp format. If none is set, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX].

Write Mode

drop-down

required

When unloading data to Azure Blob Storage, choose from the following options:

APPEND: Add the new file to the existing directory.
OVERWRITE: Completely clear the directory before adding a new file.

Copying files to an Azure Premium Storage blob

When copying files to an Azure Premium Storage blob, may provide the following error:

Self-suppression not permitted.

This is because, unlike standard Azure Storage, Azure Premium Storage does not support block blobs, append blobs, files, tables, or queues. Premium Storage supports only page blobs that are incrementally sized. A page blob is a collection of 512-byte pages that are optimized for random read and write operations. Thus, all writes must be 512-byte aligned and so any file that is not sized a multiple of 512 will fail to write. For additional information about Azure Storage blobs, we recommend consulting the Microsoft Azure documentation.

​Properties

​Copying files to an Azure Premium Storage blob

Properties

Copying files to an Azure Premium Storage blob