> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chunk Text

export const ComponentMetadata = ({warehouses, unsupportedWarehouses = [], componentType, connectionInputs, connectionOutputs}) => {
  const allWarehouses = [...warehouses.map(w => ({
    name: w,
    supported: true
  })), ...unsupportedWarehouses.map(w => ({
    name: w,
    supported: false
  }))];
  return <div style={{
    background: 'var(--colors-background-light, #f9fafb)',
    border: '1px solid var(--colors-border-default, #e5e7eb)',
    borderRadius: '12px',
    padding: '20px 28px',
    marginBottom: '28px',
    boxShadow: '0 1px 4px rgba(0,0,0,0.10)'
  }}>
      <table style={{
    width: '100%',
    borderCollapse: 'collapse'
  }}>
        <tbody>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle',
    width: '180px'
  }}>Project Availability</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>
              <div style={{
    display: 'flex',
    flexWrap: 'wrap',
    gap: '8px'
  }}>
                {allWarehouses.map((w, i) => <span key={i} style={{
    background: w.supported ? '#dcfce7' : '#fee2e2',
    color: w.supported ? '#15803d' : '#b91c1c',
    border: `1px solid ${w.supported ? '#bbf7d0' : '#fca5a5'}`,
    borderRadius: '9999px',
    padding: '3px 12px',
    fontSize: '0.85rem',
    fontWeight: '500',
    whiteSpace: 'nowrap'
  }}>
                    {w.name} {w.supported ? '✅' : '❌'}
                  </span>)}
              </div>
            </td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Component Type</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{componentType}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Inputs</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{connectionInputs}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Outputs</td>
            <td style={{
    verticalAlign: 'middle'
  }}>{connectionOutputs}</td>
          </tr>
        </tbody>
      </table>
    </div>;
};

<Info>
  Production use of this feature is available for specific editions only. [Contact our sales team](https://www.matillion.com/contact) for more information.
</Info>

<ComponentMetadata warehouses={["Snowflake"]} unsupportedWarehouses={["Databricks", "Amazon Redshift", "BigQuery"]} componentType="Orchestration" connectionInputs="One" connectionOutputs="Unlimited" />

Chunk Text is an orchestration component that performs pushdown text chunking using a Python user-defined function (UDF) in Snowflake via the computational power of your Snowflake warehouse. Specify an existing Snowflake source table and set a target table. If the target table already exists, the table will be overwritten.

You can choose text or Markdown as your data format.

***

## Properties

<ResponseField name="Name" type="string" required>
  A human-readable name for the component.
</ResponseField>

{/* <!-- param-start:[source.database] | warehouses: [snowflake] --> */}

<ResponseField name="Database" type="drop-down" required>
  The Snowflake *source* database. The special value `[Environment Default]` uses the database defined in the environment. Read [Databases, Tables and Views - Overview](https://docs.snowflake.com/en/guides-overview-db) to learn more.
</ResponseField>

{/* <!-- param-start:[source.schema] | warehouses: [snowflake] --> */}

<ResponseField name="Schema" type="drop-down" required>
  The Snowflake *source* schema. The special value `[Environment Default]` uses the schema defined in the environment. Read [Database, Schema, and Share DDL](https://docs.snowflake.com/en/sql-reference/ddl-database.html) to learn more.
</ResponseField>

{/* <!-- param-start:[source.table] | warehouses: [snowflake] --> */}

<ResponseField name="Table" type="drop-down" required>
  An existing Snowflake table to use as the input. The tables available will depend on the schema you select.
</ResponseField>

{/* <!-- param-start:[source.textColumn] | warehouses: [snowflake] --> */}

<ResponseField name="Text Column" type="drop-down" required>
  The column in your table that holds the text data you wish to chunk.
</ResponseField>

{/* <!-- param-start:[source.includeInputColumns] | warehouses: [snowflake] --> */}

<ResponseField name="Include Input Columns" type="dual listbox">
  Select any other input columns that you wish to include in the output table.
</ResponseField>

{/* <!-- param-start:[configuration.dataFormat] | warehouses: [snowflake] --> */}

<ResponseField name="Data Format" type="boolean" required>
  The format of the text to chunk.

  * **Text:** Your text data is chunked using a recursive character splitting method.
  * **Markdown:** Your text data is chunked using a header splitting method.
  * **HTML:** Your text data is chunked using a header splitting method.

  <Note>
    If your text data is Markdown, you can still set this parameter to **Text**.
  </Note>
</ResponseField>

<Tabs>
  <Tab title="Text">
    {/* <!-- param-start:[configuration.text.chunkingMethod] | warehouses: [snowflake] --> */}

    <ResponseField name="Chunking Method" type="drop-down" required>
      Currently supports [recursive character splitting](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/).

      Recursive character splitting lets you define characters to recursively split your text on until chunks are small enough. Common characters are `\n\n`, `\n`, ` `, `.`. This method attempts to keep all paragraphs (and then sentences, and then words) together as long as possible.
    </ResponseField>

    {/* <!-- param-end:[configuration.text.chunkingMethod] --> */}

    {/* <!-- param-start:[configuration.text.recursiveCharacterSplitting.chunkSize] | warehouses: [snowflake] --> */}

    <ResponseField name="Chunk Size" type="integer" required>
      The maximum size of chunks in characters. For example `100` or `250`.
    </ResponseField>

    {/* <!-- param-end:[configuration.text.recursiveCharacterSplitting.chunkSize] --> */}

    {/* <!-- param-start:[configuration.text.recursiveCharacterSplitting.chunkOverlap] | warehouses: [snowflake] --> */}

    <ResponseField name="Chunk Overlap" type="integer" required>
      The number of overlapping characters between two chunks. Overlapping chunks can help to preserve context across chunks.

      The integer value sets the total number of characters to overlap. For example `10` or `25`.
    </ResponseField>

    {/* <!-- param-end:[configuration.text.recursiveCharacterSplitting.chunkOverlap] --> */}

    {/* <!-- param-start:[configuration.text.recursiveCharacterSplitting.separators] | warehouses: [snowflake] --> */}

    <ResponseField name="Separators" type="column editor" required>
      Define separator characters to recursively split on. Common characters are `\n\n`, `\n`, ` `, `.`.

      Order matters in this list. If you wish to preserve the structure of a text document as much as possible, ensure your separators are ordered—i.e. with `\n\n` above `.`. You can reorder your rows with click-and-drag.
    </ResponseField>

    {/* <!-- param-end:[configuration.text.recursiveCharacterSplitting.separators] --> */}
  </Tab>

  <Tab title="Markdown">
    {/* <!-- param-start:[configuration.markdown.chunkingMethod] | warehouses: [snowflake] --> */}

    <ResponseField name="Chunking Method" type="drop-down" required>
      Currently supports [header splitting](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/markdown_header_metadata/), wherein you can specify which Markdown headers to split your text on.
    </ResponseField>

    {/* <!-- param-end:[configuration.markdown.chunkingMethod] --> */}

    {/* <!-- param-start:[configuration.markdown.headerSplitting.headersToSplitOn] | warehouses: [snowflake] --> */}

    <ResponseField name="Headers To Split On" type="column editor" required>
      * **Header:** Specify Markdown header syntax to define headers to split on. For example, use "#" to add a Header 1 to the list of headers to split on. Your output table will offer metadata about each row of chunked text, confirming which header element the chunked text is a child element to.
      * **Alias:** An alias for this header to contextualize the operation. For example, "Header 1", "H1", "Page title", and so on.
    </ResponseField>

    {/* <!-- param-end:[configuration.markdown.headerSplitting.headersToSplitOn] --> */}
  </Tab>

  <Tab title="HTML">
    {/* <!-- param-start:[configuration.html.chunkingMethod] | warehouses: [snowflake] --> */}

    <ResponseField name="Chunking Method" type="drop-down" required>
      Currently supports [header splitting](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/HTML_header_metadata//), wherein you can specify which HTML headers to split your text on. You don't need to specify HTML tag characters `<>`. `h1`, `h2`, and so on is sufficient.
    </ResponseField>

    {/* <!-- param-end:[configuration.html.chunkingMethod] --> */}

    {/* <!-- param-start:[configuration.html.headerSplitting.headersToSplitOn] | warehouses: [snowflake] --> */}

    <ResponseField name="Headers To Split On" type="column editor" required>
      * **Header:** Specify HTML header syntax to define headers to split on. For example, use "h1" to add a Header 1 to the list of headers to split on. Your output table will offer metadata about each row of chunked text, confirming which header element the chunked text is a child element to.
      * **Alias:** An alias for this header to contextualize the operation. For example, "Header 1", "H1", "Page title", and so on.
    </ResponseField>

    {/* <!-- param-end:[configuration.html.headerSplitting.headersToSplitOn] --> */}
  </Tab>
</Tabs>

<ResponseField name="Timeout" type="integer" required>
  The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 360 seconds (6 minutes).
</ResponseField>

{/* <!-- param-start:[destination.database] | warehouses: [snowflake] --> */}

<ResponseField name="Database" type="drop-down" required>
  The target Snowflake database. The special value `[Environment Default]` uses the database defined in the environment. Read [Databases, Tables and Views - Overview](https://docs.snowflake.com/en/guides-overview-db) to learn more.
</ResponseField>

{/* <!-- param-start:[destination.schema] | warehouses: [snowflake] --> */}

<ResponseField name="Schema" type="drop-down" required>
  The target Snowflake schema. The special value `[Environment Default]` uses the schema defined in the environment. Read [Database, Schema, and Share DDL](https://docs.snowflake.com/en/sql-reference/ddl-database.html) to learn more.
</ResponseField>

{/* <!-- param-start:[destination.table] | warehouses: [snowflake] --> */}

<ResponseField name="Table" type="string" required>
  The name of your Snowflake target table. If the table already exists, it will be overwritten when you run this pipeline.
</ResponseField>

{/* <!-- param-start:[destination.outputColumnName] | warehouses: [snowflake] --> */}

<ResponseField name="Output Column Name" type="string" required>
  Set a contextual name for your chunked text output column.
</ResponseField>
