> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Amazon OpenSearch Upsert

export const maia = "Maia";

export const ComponentMetadata = ({warehouses, unsupportedWarehouses = [], componentType, connectionInputs, connectionOutputs}) => {
  const allWarehouses = [...warehouses.map(w => ({
    name: w,
    supported: true
  })), ...unsupportedWarehouses.map(w => ({
    name: w,
    supported: false
  }))];
  return <div style={{
    background: 'var(--colors-background-light, #f9fafb)',
    border: '1px solid var(--colors-border-default, #e5e7eb)',
    borderRadius: '12px',
    padding: '20px 28px',
    marginBottom: '28px',
    boxShadow: '0 1px 4px rgba(0,0,0,0.10)'
  }}>
      <table style={{
    width: '100%',
    borderCollapse: 'collapse'
  }}>
        <tbody>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle',
    width: '180px'
  }}>Project Availability</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>
              <div style={{
    display: 'flex',
    flexWrap: 'wrap',
    gap: '8px'
  }}>
                {allWarehouses.map((w, i) => <span key={i} style={{
    background: w.supported ? '#dcfce7' : '#fee2e2',
    color: w.supported ? '#15803d' : '#b91c1c',
    border: `1px solid ${w.supported ? '#bbf7d0' : '#fca5a5'}`,
    borderRadius: '9999px',
    padding: '3px 12px',
    fontSize: '0.85rem',
    fontWeight: '500',
    whiteSpace: 'nowrap'
  }}>
                    {w.name} {w.supported ? '✅' : '❌'}
                  </span>)}
              </div>
            </td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Component Type</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{componentType}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Inputs</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{connectionInputs}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Outputs</td>
            <td style={{
    verticalAlign: 'middle'
  }}>{connectionOutputs}</td>
          </tr>
        </tbody>
      </table>
    </div>;
};

<Info>
  Production use of this feature is available for specific editions only. [Contact our sales team](https://www.matillion.com/contact) for more information.
</Info>

<ComponentMetadata warehouses={["Snowflake", "Databricks", "Amazon Redshift"]} unsupportedWarehouses={[]} componentType="Orchestration" connectionInputs="One" connectionOutputs="Unlimited" />

Amazon OpenSearch Upsert is an orchestration component that lets you convert data stored in your cloud data warehouse into vector embeddings to then be stored in an Amazon OpenSearch Index. This will allow you to use alternative embedding models (for example, OpenAI or Amazon Bedrock) instead of Amazon OpenSearch's built-in embedding.

<Note>
  Currently, this component only supports provisioned OpenSearch services, not serverless.
</Note>

***

## Prerequisites

Before you use the Amazon OpenSearch Upsert component, you'll need to add [AWS cloud credentials](/docs/guides/cloud-credentials) to {maia}.

## Permissions

You'll need to ensure you have permissions for Amazon OpenSearch Service. If you're using Amazon Bedrock for your embeddings, you'll also need permission to invoke the model.

<Accordion title="Amazon OpenSearch Service">
  To use the Amazon OpenSearch Upsert component, ensure that your IAM role or user has the necessary permissions to interact with Amazon OpenSearch. Below is an example of an IAM policy that grants the required permissions:

  ```json theme={null}
  {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Action": [
          "es:ESHttpPost",
          "es:ESHttpPut",
          "es:ESHttpGet",
          "es:ESHttpDelete"
        ],
        "Resource": "arn:aws:es:<region>:<account-id>:domain/<your-domain-name>/*"
      }
    ]
  }
  ```

  If you are using fine-grained access control in Amazon OpenSearch Service, you'll also need OpenSearch user/role permissions (for example, a role mapped to the appropriate OpenSearch index with write access).
</Accordion>

<Accordion title="Amazon Bedrock">
  If you are using Amazon Bedrock as your embedding provider, ensure that your IAM role or user has the necessary permissions to invoke the model. Below is an example of an IAM policy that grants the required permissions:

  ```json theme={null}
  {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Action": [
          "bedrock:InvokeModel"
        ],
        "Resource": "arn:aws:bedrock:<region>:<account-id>:foundation-model/<model-name>"
      }
    ]
  }
  ```
</Accordion>

***

## Properties

Reference material is provided below for the Source, Configure, and Destination properties.

<ResponseField name="Name" type="string" required>
  A human-readable name for the component.
</ResponseField>

### Source

<Tabs>
  <Tab title="Snowflake">
    <ResponseField name="Database" type="drop-down" required>
      The Snowflake database. The special value `[Environment Default]` uses the database defined in the environment. Read [Databases, Tables and Views - Overview](https://docs.snowflake.com/en/guides-overview-db) to learn more.
    </ResponseField>

    {/* <!-- param-start:[source.snowflake.schema] | warehouses: [snowflake] --> */}

    <ResponseField name="Schema" type="drop-down" required>
      The Snowflake *source* schema. The special value `[Environment Default]` uses the schema defined in the environment. Read [Database, Schema, and Share DDL](https://docs.snowflake.com/en/sql-reference/ddl-database.html) to learn more.
    </ResponseField>
  </Tab>

  <Tab title="Databricks">
    <ResponseField name="Catalog" type="drop-down" required>
      Select a [Databricks Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/index.html). The special value `[Environment Default]` uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter.
    </ResponseField>

    {/* <!-- param-start:[source.databricks.schema] | warehouses: [databricks] --> */}

    <ResponseField name="Schema (Database)" type="drop-down" required>
      The Databricks schema. The special value `[Environment Default]` uses the schema defined in the environment. Read [Create and manage schemas](https://docs.databricks.com/en/data-governance/unity-catalog/create-schemas.html) to learn more.
    </ResponseField>
  </Tab>

  <Tab title="Amazon Redshift">
    <ResponseField name="Schema" type="drop-down" required>
      The Amazon Redshift *source* schema. The special value `[Environment Default]` uses the schema defined in the environment. Read [Schemas](https://docs.aws.amazon.com/redshift/latest/dg/r_Schemas_and_tables.html) to learn more.
    </ResponseField>
  </Tab>
</Tabs>

<ResponseField name="Table" type="drop-down" required>
  Select the table that contains the data you want to upsert into Amazon OpenSearch.
</ResponseField>

{/* <!-- param-start:[source.snowflake.keyColumn, source.databricks.keyColumn, source.redshift.keyColumn] | warehouses: [snowflake, databricks, redshift] --> */}

<ResponseField name="Key Column" type="drop-down" required>
  This column is used to uniquely identify each row in the table. It is used to ensure that the data is not duplicated when it is loaded into the destination. An example use case would be a column of product IDs that you want to use to identify each product in the table.
</ResponseField>

{/* <!-- param-start:[source.snowflake.textColumn, source.databricks.textColumn, source.redshift.textColumn] | warehouses: [snowflake, databricks, redshift] --> */}

<ResponseField name="Text Column" type="drop-down" required>
  This column is used to generate vectors for the text data in the table, which are then upserted as embeddings to Amazon OpenSearch. An example use case of this column would be a column of product reviews that you want to convert into vectors for semantic search or to perform sentiment analysis on.
</ResponseField>

{/* <!-- param-start:[source.snowflake.limit, source.databricks.limit, source.redshift.limit] | warehouses: [snowflake, databricks, redshift] --> */}

<ResponseField name="Limit" type="integer">
  Set the `Limit` to control the maximum number of records (rows) to load from the table. The default is 1000.
</ResponseField>

### Configure

<ResponseField name="Embedding Provider" type="drop-down" required>
  The embedding provider is the API service used to convert the search term into a vector. Choose either OpenAI or Amazon Bedrock. The embedding provider receives a search term (e.g. "How do I log in?") and returns a vector.

  <Tabs>
    <Tab title="OpenAI">
      {/* <!-- param-start:[embeddingGenerator.openAI.apiKey] | warehouses: [snowflake, databricks, redshift] --> */}

      <ResponseField name="OpenAI API Key" type="drop-down" required>
        Use the drop-down menu to select the corresponding secret definition that denotes the value of your OpenAI API key.

        Read [Secrets and secret definitions](/docs/guides/secrets-and-secret-definitions) to learn how to create a new secret definition.

        To create a new OpenAI API key:

        1. Log in to [OpenAI](https://platform.openai.com/).
        2. Click your avatar in the top-right of the UI.
        3. Click **View API keys**.
        4. Click **+ Create new secret key**.
        5. Give a name for your new secret key and click **Create secret key**.
        6. Copy your new secret key and save it. Then click **Done**.
      </ResponseField>

      {/* <!-- param-end:[embeddingGenerator.openAI.apiKey] --> */}

      {/* <!-- param-start:[embeddingGenerator.openAI.model] | warehouses: [snowflake, databricks, redshift] --> */}

      <ResponseField name="Model" type="drop-down" required>
        Select an OpenAI [embedding model](https://platform.openai.com/docs/guides/embeddings).

        Currently supports:

        * text-embedding-ada-002
        * text-embedding-3-small
        * text-embedding-3-large
      </ResponseField>

      {/* <!-- param-end:[embeddingGenerator.openAI.model] --> */}

      {/* <!-- param-start:[embeddingGenerator.embeddingBatchSize] | warehouses: [snowflake, databricks, redshift] --> */}

      <ResponseField name="API Batch Size" type="integer" required>
        Set the [size of array of data per API call](https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-input). The default size is 10. When set to 10, 1000 rows would therefore require 100 API calls.

        You may wish to reduce this number if a row contains a high volume of data, and conversely, increase this number for rows with low data volume.
      </ResponseField>

      {/* <!-- param-end:[embeddingGenerator.embeddingBatchSize] --> */}
    </Tab>

    <Tab title="Amazon Bedrock">
      {/* <!-- param-start:[embeddingGenerator.aws.region] | warehouses: [snowflake, databricks, redshift] --> */}

      <ResponseField name="Region" type="drop-down" required>
        Select the [AWS region](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Regions) for your embedding model.
      </ResponseField>

      {/* <!-- param-end:[embeddingGenerator.aws.region] --> */}

      {/* <!-- param-start:[embeddingGenerator.aws.model] | warehouses: [snowflake, databricks, redshift] --> */}

      <ResponseField name="Model" type="drop-down" required>
        Select an embedding model.

        Currently supports:

        * [Titan Embeddings G1 - Text](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-embed-text.html)
      </ResponseField>

      {/* <!-- param-end:[embeddingGenerator.aws.model] --> */}
    </Tab>
  </Tabs>
</ResponseField>

### Destination

<ResponseField name="Endpoint URL" type="string" required>
  The URL of the Amazon OpenSearch domain endpoint to upsert your vector embeddings to. To find your endpoint URL:

  1. Log in to the [Amazon OpenSearch Service console](https://console.aws.amazon.com/aos/home).
  2. Navigate to the **Domains** page.
  3. Click on the domain you want to use.
  4. Copy the **Domain Endpoint** URL from the domain details page.
</ResponseField>

{/* <!-- param-start:[destination.index] | warehouses: [snowflake, databricks, redshift] --> */}

<ResponseField name="Index" type="string" required>
  The name of an existing Amazon OpenSearch index where the vector embeddings will be upserted. An index in Amazon OpenSearch is similar to a table in a relational database.

  Below is an example code snippet you could use to create an index. You can run the following command in the OpenSearch Dev Tools console or a REST API client like curl or Postman:

  ```
  PUT /test-index
  {
    "settings": {
      "index.knn": true
    },
    "mappings": {
      "properties": {
        "vector": {
          "type": "knn_vector",
          "dimension": 1536
        },
        "rawData": {
          "type": "text",
          "index": false
        },
        "key": {
          "type": "object"
        }
      }
    }
  }
  ```

  <Note>
    The `dimension` value must match the output dimension of the embedding model you have chosen:

    * `text-embedding-ada-002` and `text-embedding-3-small` output vectors of dimension "1536".
    * `text-embedding-3-large` output vectors of dimension "3072".

    If you're using the `text-embedding-3-large` model, update `"dimension": 1536` to "dimension": 3072 in the mapping above.
  </Note>
</ResponseField>

{/* <!-- param-start:[destination.region] | warehouses: [snowflake, databricks, redshift] --> */}

<ResponseField name="Region" type="drop-down" required>
  Select the [AWS region](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Regions) of your Amazon OpenSearch Service domain.
</ResponseField>
