Production use of this feature is available for specific editions only. Contact our sales team for more information.
Properties
A human-readable name for the component.
The Snowflake source database. The special value
[Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.The Snowflake source schema. The special value
[Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.An existing Snowflake table to use as the input. The tables available will depend on the schema you select.
The column in your table that holds the text data you wish to chunk.
Select any other input columns that you wish to include in the output table.
The format of the text to chunk.
- Text: Your text data is chunked using a recursive character splitting method.
- Markdown: Your text data is chunked using a header splitting method.
- HTML: Your text data is chunked using a header splitting method.
If your text data is Markdown, you can still set this parameter to Text.
- Text
- Markdown
- HTML
Currently supports recursive character splitting.Recursive character splitting lets you define characters to recursively split your text on until chunks are small enough. Common characters are
\n\n, \n, , .. This method attempts to keep all paragraphs (and then sentences, and then words) together as long as possible.The maximum size of chunks in characters. For example
100 or 250.The number of overlapping characters between two chunks. Overlapping chunks can help to preserve context across chunks.The integer value sets the total number of characters to overlap. For example
10 or 25.Define separator characters to recursively split on. Common characters are
\n\n, \n, , ..Order matters in this list. If you wish to preserve the structure of a text document as much as possible, ensure your separators are ordered—i.e. with \n\n above .. You can reorder your rows with click-and-drag.The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 360 seconds (6 minutes).
The target Snowflake database. The special value
[Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.The target Snowflake schema. The special value
[Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.The name of your Snowflake target table. If the table already exists, it will be overwritten when you run this pipeline.
Set a contextual name for your chunked text output column.

