Skip to main content
The S3 Load component lets users load data into an existing table from objects stored in Amazon Simple Storage Service (Amazon S3). This component requires working AWS Credentials, with Read access to the bucket containing the source data files. To access an S3 bucket from a different AWS account, read Background: Cross-account permissions and using IAM roles.
If you’re using a Matillion Full SaaS solution, you may need to allow these IP address ranges from which Matillion Full SaaS agents will call out to their source systems or to cloud data platforms.
If the component requires access to a cloud provider (AWS, Azure, or GCP), it will use credentials as follows:
  • If using Matillion Full SaaS: The component will use the cloud credentials associated with your environment to access resources.
  • If using Hybrid SaaS: By default the component will inherit the agent’s execution role (service account role). However, if there are cloud credentials associated to your environment, these will overwrite the role.

Video example


Properties

Name
string
required
A human-readable name for the component.
Stage
drop-down
required
Select a staging area for the data. Staging areas can be created through Snowflake using the CREATE STAGE command. Internal stages can be set up this way to store staged data within Snowflake. Selecting [Custom] will avail the user of properties to specify a custom staging area on S3. Users can add a fully qualified stage by typing the stage name. This should follow the format databaseName.schemaName.stageName
Authentication
drop-down
required
Select the authentication method. Users can choose either:
  • Credentials: Uses AWS security credentials.
  • Storage Integration: Use a Snowflake storage integration. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of permitted or blocked storage locations (Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage). More information can be found at CREATE STORAGE INTEGRATION.
Storage Integration
drop-down
required
Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them. Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage, regardless of the cloud provider that hosts your Snowflake account.
S3 Object Prefix
file explorer
required
To retrieve the intended files, use the file explorer to enter the container path where the S3 bucket is located, or select from the list of S3 buckets.This must have the format S3://<bucket>/<path>.
Pattern
string
required
A string that will partially match all file paths and names that are to be included in the load. Defaults to .* indicating all files within the S3 Object Prefix. This property is a pattern on the complete path of the file, and is not just relative to the directory configured in the S3 Object Prefix property.The subfolder containing the object to load must be included here.
Encryption
drop-down
required
Decide how the files are encrypted inside the S3 bucket. This property is available when using an existing Amazon S3 location for staging.
KMS Key ID
drop-down
The ID of the KMS encryption key you have chosen to use in the Encryption property.
Master Key
string
required
Your client-side encryption master key. This property is only available when using client-side encryption.
Warehouse
drop-down
required
The Snowflake warehouse used to run the queries. The special value [Environment Default] uses the warehouse defined in the environment. Read Overview of Warehouses to learn more.
Database
drop-down
required
The Snowflake database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.
Schema
drop-down
required
The Snowflake schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.
Target Table
string
required
Select an existing table to load data into. The tables available for selection depend on the chosen schema.
Load Columns
dual listbox
Choose the columns to load. If you leave this parameter empty, all columns will be loaded.
Format
drop-down
required
Select a pre-made file format that will automatically set many of the S3 Load component properties. These formats can be created through the Create File Format component.Users can add a fully qualified format by typing the format name. This should read as databaseName.schemaName.formatName
File Type
drop-down
required
Select the type of data to load. Available data types are: AVRO, CSV, JSON, ORC, PARQUET, and XML. Some file types may require additional formatting—this is explained in the Snowflake documentation.Component properties will change to reflect the selected file type. Click one of the tabs below for properties applicable to that file type.
Compression
drop-down
required
Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO.
Null If
editor
required
Specify one or more strings (one string per row of the table) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with an SQL NULL value for that field in the loaded table. Click + to add a string.
Trim Space
boolean
required
When True, removes whitespace from fields. Default setting is False.
On Error
drop-down
required
Decide how to proceed upon an error.
  • Abort Statement: Aborts the load if any error is encountered. This is the default setting.
  • Continue: Continue loading the file.
  • Skip File: Skip file if any errors are encountered in the file.
  • Skip File When n Errors: Skip file when the number of errors in the file is equal to or greater than the specified number in the next property, n.
  • Skip File When n% Errors: Skip file when the percentage of errors in the file exceeds the specified percentage of n.
n
integer
required
Specify the number of errors or the percentage of errors required to skip the file. This parameter only accepts integer characters. % is not accepted. Specify percentages as a number only.
Size Limit (B)
integer
required
Specify the maximum size, in bytes, of data to be loaded for a given COPY statement. If the maximum is exceeded, the COPY operation discontinues loading files. For more information, refer to the Snowflake documentation.
Purge Files
boolean
required
When True, purges data files after the data is successfully loaded. Default setting is False.
Match By Column Name
drop-down
required
Specify whether to load semi-structured data into columns in the target table that match corresponding columns represented in the data.
  • Case Insensitive: Load semi-structured data into columns in the target table that match corresponding columns represented in the data. Column names should be case-insensitive.
  • Case Sensitive: Load semi-structured data into columns in the target table that match corresponding columns represented in the data. Column names should be case-sensitive.
  • None: The COPY operation loads the semi-structured data into a variant column or, if a query is included in the COPY statement, transforms the data.
Truncate Columns
boolean
required
When True, strings are automatically truncated to the target column length. When False (default), the COPY statement produces an error if a loaded string exceeds the target column length.
Force Load
boolean
required
When True, loads all files, regardless of whether they have been loaded previously and haven’t changed since they were loaded. Default setting is False.When set to True, this option reloads files and can lead to duplicated data in a table.
Metadata Fields
dual listbox
required
Snowflake metadata columns available to include in the load. For more information, read Querying Metadata for Staged Files. This property is only available when an external stage is selected.

S3 access

To access an S3 bucket from a different AWS account, the following is required:
  • Set up cross-account access via AWS roles.
  • The user must type in the bucket they want to access or use a variable to load/unload to those structures.

File patterns with Snowflake

In Snowflake, the Pattern parameter in the COPY INTO syntax is a pattern on the complete path of the file and is not just relative to the directory configured in the S3 Object Prefix parameter. The table below provides an example of S3 Object Prefix and Pattern behaviors, including success and failure states.
S3Object PrefixPatternOutcomeComments
s3://testbucket/testDirectory/alphabet_0_0_0.csv.gzSuccessThis is the format that the S3 Load Generator will generate.
s3://testbucket/testDirectory/testDirectory/alphabet_0_0_0.csv.gzSuccessLoads the file successfully because the pattern is matching the full path.
s3://testbucket/testDirectory/.*.csv.gzSuccessWould load all files ending in .csv.gz in the testDirectory directory.
s3://testbucket/testDirectory/alphabet_0_0_0.csv.gzFailureDoes not load the file because the pattern does not match.