> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Document AI Predict

export const maia = "Maia";

export const ComponentMetadata = ({warehouses, unsupportedWarehouses = [], componentType, connectionInputs, connectionOutputs}) => {
  const allWarehouses = [...warehouses.map(w => ({
    name: w,
    supported: true
  })), ...unsupportedWarehouses.map(w => ({
    name: w,
    supported: false
  }))];
  return <div style={{
    background: 'var(--colors-background-light, #f9fafb)',
    border: '1px solid var(--colors-border-default, #e5e7eb)',
    borderRadius: '12px',
    padding: '20px 28px',
    marginBottom: '28px',
    boxShadow: '0 1px 4px rgba(0,0,0,0.10)'
  }}>
      <table style={{
    width: '100%',
    borderCollapse: 'collapse'
  }}>
        <tbody>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle',
    width: '180px'
  }}>Project Availability</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>
              <div style={{
    display: 'flex',
    flexWrap: 'wrap',
    gap: '8px'
  }}>
                {allWarehouses.map((w, i) => <span key={i} style={{
    background: w.supported ? '#dcfce7' : '#fee2e2',
    color: w.supported ? '#15803d' : '#b91c1c',
    border: `1px solid ${w.supported ? '#bbf7d0' : '#fca5a5'}`,
    borderRadius: '9999px',
    padding: '3px 12px',
    fontSize: '0.85rem',
    fontWeight: '500',
    whiteSpace: 'nowrap'
  }}>
                    {w.name} {w.supported ? '✅' : '❌'}
                  </span>)}
              </div>
            </td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Component Type</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{componentType}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Inputs</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{connectionInputs}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Outputs</td>
            <td style={{
    verticalAlign: 'middle'
  }}>{connectionOutputs}</td>
          </tr>
        </tbody>
      </table>
    </div>;
};

<ComponentMetadata warehouses={["Snowflake"]} unsupportedWarehouses={["Databricks", "Amazon Redshift", "BigQuery"]} componentType="Transformation" connectionInputs="One" connectionOutputs="Unlimited" />

<Badge color="green" shape="pill" stroke size="lg">Public preview</Badge>

<Info>
  Production use of this feature is available for specific editions only. [Contact our sales team](https://www.matillion.com/contact) for more information.
</Info>

The **Document AI Predict** transformation component extracts data from documents, such as PDFs and images. It invokes the [Snowflake document predict function](https://docs.snowflake.com/en/sql-reference/classes/document-intelligence/methods/predict) and lets you call [Document AI](https://docs.snowflake.com/en/user-guide/snowflake-cortex/document-ai/overview) models from the context of a {maia} pipeline.

### Use case

This component can be used to extract data for a range of purposes. For example, it is useful in industries such as finance, logistics, and healthcare, where documents like invoices, contracts, and patient records are often available in different formats, including PDFs and scanned images.

***

## Video example

<iframe width="560" height="315" src="https://www.youtube.com/embed/Aw4xn6mHxsY?si=45cBpcQxd6sIXkUf&enablejsapi=1" title="YouTube video player" frameBorder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" referrerPolicy="strict-origin-when-cross-origin" allowFullScreen />

***

## Prerequisites

* Files you wish to process must be in your Snowflake stage.
* You must have created and configured a Document AI build model in Snowflake already. For more information, read [Set up the required objects and privileges](https://docs.snowflake.com/en/user-guide/snowflake-cortex/document-ai/tutorials/create-processing-pipelines#set-up-the-required-objects-and-privileges).
* The Document AI Predict component requires that your file's relative path and presigned URL be available in a Snowflake table. Use this example query to populate your table:

```sql theme={null}
select
    relative_path,
    GET_PRESIGNED_URL(@<stage_name>, relative_path) presigned_url
from directory(@<stage_name>);
```

<Note>
  The final step in the Snowflake tutorial, **Create a document processing pipeline**, isn't required in {maia}.
</Note>

***

## Properties

<ResponseField name="Name" type="string" required>
  A human-readable name for the component.
</ResponseField>

{/* <!-- param-start:[database] | warehouses: [snowflake] --> */}

<ResponseField name="Database" type="drop-down" required>
  The Snowflake *source* database. The special value `[Environment Default]` uses the database defined in the environment. Read [Databases, Tables and Views - Overview](https://docs.snowflake.com/en/guides-overview-db) to learn more.
</ResponseField>

{/* <!-- param-start:[schema] | warehouses: [snowflake] --> */}

<ResponseField name="Schema" type="drop-down" required>
  The Snowflake *source* schema. The special value `[Environment Default]` uses the schema defined in the environment. Read [Database, Schema, and Share DDL](https://docs.snowflake.com/en/sql-reference/ddl-database.html) to learn more.
</ResponseField>

{/* <!-- param-start:[modelBuildName] | warehouses: [snowflake] --> */}

<ResponseField name="Model Build Name" type="string" required>
  The build name of the Document AI model. Read [Prepare a Document AI model build](https://docs.snowflake.com/en/user-guide/snowflake-cortex/document-ai/prepare-model-build) to learn more.
</ResponseField>

{/* <!-- param-start:[modelBuildVersion] | warehouses: [snowflake] --> */}

<ResponseField name="Model Build Version" type="string">
  Optionally specify the version of the model to use. If not set, this parameter will default to the latest version.
</ResponseField>

{/* <!-- param-start:[urlColumn] | warehouses: [snowflake] --> */}

<ResponseField name="URL Column" type="drop-down" required>
  The source column containing the presigned URLs of the staged files the model should act on.

  Presigned URLs let the user bypass the authentication and sign-in process.

  As part of the Document AI API, provide a presigned URL to the document you want to run the model against. Document AI then uses the presigned URL to fetch the intended document.
</ResponseField>

{/* <!-- param-start:[includeInputColumns] | warehouses: [snowflake] --> */}

<ResponseField name="Include Input Columns" type="boolean" required>
  * **Yes:** Outputs both your source URL columns *and* the prediction columns. This is the default setting.
  * **No:** Only includes the new prediction columns.
</ResponseField>
