Skip to main content
Production use of this feature is available for specific editions only. Contact our sales team for more information.
Chunk Text is an orchestration component that performs pushdown text chunking using a Python user-defined function (UDF) in Snowflake via the computational power of your Snowflake warehouse. Specify an existing Snowflake source table and set a target table. If the target table already exists, the table will be overwritten. You can choose text or Markdown as your data format.

Properties

Name
string
required
A human-readable name for the component.
Database
drop-down
required
The Snowflake source database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.
Schema
drop-down
required
The Snowflake source schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.
Table
drop-down
required
An existing Snowflake table to use as the input. The tables available will depend on the schema you select.
Text Column
drop-down
required
The column in your table that holds the text data you wish to chunk.
Include Input Columns
dual listbox
Select any other input columns that you wish to include in the output table.
Data Format
boolean
required
The format of the text to chunk.
  • Text: Your text data is chunked using a recursive character splitting method.
  • Markdown: Your text data is chunked using a header splitting method.
  • HTML: Your text data is chunked using a header splitting method.
If your text data is Markdown, you can still set this parameter to Text.
Chunking Method
drop-down
required
Currently supports recursive character splitting.Recursive character splitting lets you define characters to recursively split your text on until chunks are small enough. Common characters are \n\n, \n, , .. This method attempts to keep all paragraphs (and then sentences, and then words) together as long as possible.
Chunk Size
integer
required
The maximum size of chunks in characters. For example 100 or 250.
Chunk Overlap
integer
required
The number of overlapping characters between two chunks. Overlapping chunks can help to preserve context across chunks.The integer value sets the total number of characters to overlap. For example 10 or 25.
Separators
column editor
required
Define separator characters to recursively split on. Common characters are \n\n, \n, , ..Order matters in this list. If you wish to preserve the structure of a text document as much as possible, ensure your separators are ordered—i.e. with \n\n above .. You can reorder your rows with click-and-drag.
Timeout
integer
required
The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 360 seconds (6 minutes).
Database
drop-down
required
The target Snowflake database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.
Schema
drop-down
required
The target Snowflake schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.
Table
string
required
The name of your Snowflake target table. If the table already exists, it will be overwritten when you run this pipeline.
Output Column Name
string
required
Set a contextual name for your chunked text output column.