Skip to main content
The JSON Load orchestration component loads data into a table from one or more JSON files. The component automatically infers the schema from the structure of the selected file(s), creates the corresponding table, and then loads the data into the table.
For Google BigQuery projects, the JSON Load connector only supports newline-delimited JSON files (also called NDJSON or JSON Lines) with one complete JSON object per line.

Use cases

JSON Load suits data engineering and analytics tasks that bring semi-structured or API-originated data into your warehouse. For example, you can use this component to:
  • Load API response exports or webhook payloads stored as JSON files for analysis alongside structured warehouse data.
  • Ingest document exports from NoSQL databases such as MongoDB or DynamoDB to enable relational querying.
  • Load application event logs or audit trails stored in JSON for compliance reporting or operational analytics.
  • Bring in IoT device telemetry or mobile app event data stored as JSON files as a first step before downstream transformation.
  • Ingest configuration or metadata files produced by application deployments to track changes over time.
You can authorize this component using either of these two methods:
  • Cloud Provider Credentials: By default, explicit Cloud Provider Credentials are used to facilitate cross-account or cross-cloud data loads.
  • Agent Identity: If cloud provider credentials are not specified, the component attempts to use the Matillion Agent’s native service identity to access the source data.

Properties

Name
string
required
A human-readable name for the component.

Connect

Type
drop-down
required
  • Snowflake Managed: Load your file from a Snowflake internal stage.
  • S3: Load your file from an S3 bucket.
  • Google Cloud Storage: Load your file from a Google Cloud Storage bucket.
  • Azure Blob Storage: Load your file from an Azure Blob Storage container.
Stage
drop-down
required
Select a staging area for the data. The special value [New Stage] will create a temporary stage to be used for loading the data when the corresponding parameter values are provided.The options in this drop-down menu depend on the values you select for the Database and Schema parameters. If you change these values, the list of available options updates automatically, and the previously selected option may become invalid.When [New Stage] is selected, the component uses the cloud credentials configured for your environment to access the required resources.
Staged file path
string
A stage may include directories. The user has the option to browse and select files from a specific directory path, for example, /Example/Path.If this is left blank, and if the Pattern parameter is also empty, all files from the stage will be loaded.
Pattern
string
A valid regular expression (regex) to match part of a file’s path or name. Files that match the pattern will be included in the load.If this parameter is left empty, the component automatically uses the pattern *, which matches all files within the specified Staged file path or Stage parameters. The pattern applies to the entire file path, not just the directory defined in Staged file path.The subfolder containing the object to load must be included here.
  • Only JSON files are supported.
  • Ensure that the pattern entered correctly targets the files you intend to load.

Configure

File format
drop-down
required
The default value is set to [New File Format]. Specify a file format, and a temporary format with those settings will be used when the component runs. Alternatively, select a pre-made file format.
Compression
drop-down
required
Select the compression algorithm used on the file being loaded.The default setting is AUTO.
Enable octal
boolean
required
When True, enables the parsing of octal values.Default setting is False.
Allow duplicates
boolean
required
When True, allows duplicate object field names.Default setting is False.
Strip outer array
boolean
required
When True, instructs the JSON parser to remove outer brackets.Default setting is False.
Strip null values
boolean
required
When True, instructs the JSON parser to remove any object fields or array elements containing NULL values.Default setting is False.
Ignore UTF8 errors
boolean
required
When True, replaces any invalid UTF-8 sequences with Unicode characters. When False, UTF-8 errors will not produce an error in the pipeline run.Default setting is False.
Binary format
drop-down
required
Determines how binary string values are encoded in data files. Use this setting when importing data into a table’s binary columns.Choose from the following options:
  • HEX: Binary string values are encoded as hexadecimal strings.
  • BASE64: Binary string values are encoded as Base64 strings.
  • UTF8: Binary string values are encoded as UTF-8 strings.
Default setting is HEX.This will only work when Match by column name is set to either Case Insensitive or Case Sensitive.
Multi line
boolean
required
Allows multiple lines in a JSON record. When set to No, any JSON record containing a newline character is considered invalid, and will trigger an error.Default setting is False.
Replace invalid characters
boolean
required
Snowflake replaces invalid UTF-8 characters with the Unicode replacement character. When False, the load operation produces an error when invalid UTF-8 character encoding is detected.Default setting is False.
Skip byte order mark
boolean
required
Determines whether to ignore a BOM (byte order mark) in the input file.If set to FALSE, Snowflake processes any BOM found in the file, which may lead to errors or cause the BOM to be included in the first column of the table.Default setting is True.
Null if
editor
Specify one or more strings (one string per row of the table) to convert to NULL values. When one of these strings is encountered in the file, it is replaced with an SQL NULL value for that field in the loaded table. Click + to add a string.This will only work when Match By Column Name is set to either Case Insensitive or Case Sensitive.

Destination

Warehouse
drop-down
required
Database
drop-down
required
Schema
drop-down
required
Target table name
string
required
Specify the name of the newly created or existing table to load data into. The table will be created according to the Load strategy you select below.
Load strategy
drop-down
required
  • Replace: If the specified table name already exists, that table will be destroyed and replaced by the table created during this pipeline run.
  • Truncate and Insert: Each time the pipeline runs, two operations are performed: first, the table is truncated, meaning all existing rows are deleted. Then, your new rows are inserted. The table itself is never destroyed and recreated.
  • Fail if Exists: If the specified table name already exists, this pipeline will fail to run, and no data will be copied to the table.
  • Append: If the specified table name already exists, then the data is inserted without altering or deleting the existing data in the table. It’s appended onto the end of the existing data in the table. If the specified table name doesn’t exist, then an error message will appear and the table is not created. You must always provide a table name.

Advanced Settings

N
integer
required
Specify the number of errors or the percentage of errors required to skip the file. This parameter only accepts integer characters. % is not accepted. Specify percentages as a number only.This parameter is only available when On Error is set to either Skip File When n Errors or Skip File When n% Errors.
Size limit (in bytes)
integer
Specify the maximum size, in bytes, of data to be loaded for a given COPY statement. If the maximum is exceeded, the COPY operation discontinues loading files. For more information, refer to the Snowflake documentation.
Purge files
boolean
required
When True, purges data files after the data is successfully loaded. Default setting is False.
Match by column name
drop-down
required
Specify whether to load semi-structured data into columns in the target table that match corresponding columns represented in the data.
  • Case Insensitive: Load semi-structured data into columns in the target table that match corresponding columns represented in the data. Column names should be case-insensitive.
  • Case Sensitive: Load semi-structured data into columns in the target table that match corresponding columns represented in the data. Column names should be case-sensitive.
  • None: The COPY operation loads the semi-structured data into a variant column or, if a query is included in the COPY statement, transforms the data.
Truncate columns
boolean
required
When True, strings are automatically truncated to the target column length. When False (default), the COPY statement produces an error if a loaded string exceeds the target column length.
Force load
boolean
required
When True, loads all files, regardless of whether they have been loaded previously and haven’t changed since they were loaded. Default setting is False.When set to True, this option reloads files and can lead to duplicated data in a table.