> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AI Similarity

export const ComponentMetadata = ({warehouses, unsupportedWarehouses = [], componentType, connectionInputs, connectionOutputs}) => {
  const allWarehouses = [...warehouses.map(w => ({
    name: w,
    supported: true
  })), ...unsupportedWarehouses.map(w => ({
    name: w,
    supported: false
  }))];
  return <div style={{
    background: 'var(--colors-background-light, #f9fafb)',
    border: '1px solid var(--colors-border-default, #e5e7eb)',
    borderRadius: '12px',
    padding: '20px 28px',
    marginBottom: '28px',
    boxShadow: '0 1px 4px rgba(0,0,0,0.10)'
  }}>
      <table style={{
    width: '100%',
    borderCollapse: 'collapse'
  }}>
        <tbody>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle',
    width: '180px'
  }}>Project Availability</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>
              <div style={{
    display: 'flex',
    flexWrap: 'wrap',
    gap: '8px'
  }}>
                {allWarehouses.map((w, i) => <span key={i} style={{
    background: w.supported ? '#dcfce7' : '#fee2e2',
    color: w.supported ? '#15803d' : '#b91c1c',
    border: `1px solid ${w.supported ? '#bbf7d0' : '#fca5a5'}`,
    borderRadius: '9999px',
    padding: '3px 12px',
    fontSize: '0.85rem',
    fontWeight: '500',
    whiteSpace: 'nowrap'
  }}>
                    {w.name} {w.supported ? '✅' : '❌'}
                  </span>)}
              </div>
            </td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Component Type</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{componentType}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Inputs</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{connectionInputs}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Outputs</td>
            <td style={{
    verticalAlign: 'middle'
  }}>{connectionOutputs}</td>
          </tr>
        </tbody>
      </table>
    </div>;
};

<ComponentMetadata warehouses={["Databricks"]} unsupportedWarehouses={["Snowflake", "Amazon Redshift", "BigQuery"]} componentType="Transformation" connectionInputs="One" connectionOutputs="Unlimited" />

<Info>
  Production use of this feature is available for specific editions only. [Contact our sales team](https://www.matillion.com/contact) for more information.
</Info>

The **AI Similarity** transformation component uses the Databricks [ai\_similarity()](https://docs.databricks.com/en/sql/language-manual/functions/ai_similarity.html) function to invoke generative AI to compare two strings and compute the semantic similarity score. This function uses a Databricks chat model serving endpoint made available by [Databricks Foundation Model APIs](https://docs.databricks.com/en/machine-learning/foundation-models/index.html). This lets the comparison go beyond simple string matching, as the chat model understands meaning, context, and phrasing.

The input is two columns of text data, which are to be compared. Both columns must be in the same input table. If you want to compare data from different tables, you will first need to perform additional transformations, such as a [Join](/docs/components/join), to put the data into a single table.

The output is a float value, representing the semantic similarity between the two input strings. The output score is relative and should only be used for ranking. Scores of 1 indicate that the two texts are equal.

<Note>
  Make sure you have read and understand the [Requirements](https://docs.databricks.com/en/sql/language-manual/functions/ai_similarity.html#requirements) set out by Databricks before using this component.
</Note>

### Use case

Some typical use cases for this component include:

* Deduplication of text data, by identifying and grouping duplicate or near-duplicate entries in datasets like product descriptions, survey responses, or user comments. For example, "iPhone 14 Pro Max 256GB" and "Apple iPhone 14 Pro Max, 256 GB" are non-matching strings but have a high similarity score so can be considered duplicates.
* Record linking through semantic joins on datasets where the matching field contains slightly different wording.
* Detecting content overlaps to check whether content is reworded or copied from other sources.

***

## Properties

<ResponseField name="Name" type="string" required>
  A human-readable name for the component.
</ResponseField>

{/* <!-- param-start:[columns] | warehouses: [databricks] --> */}

<ResponseField name="Columns" type="column editor" required>
  **Base Column:** The base column.
  **Comparison Column:** The column to compare against your base column.
</ResponseField>

{/* <!-- param-start:[includeInputColumns] | warehouses: [databricks] --> */}

<ResponseField name="Include Input Columns" type="boolean" required>
  * **Yes:** Includes both your input columns *and* the new semantic similarity scores column. This will also include those input columns *not* selected in **Columns**.
  * **No:** Only includes the new semantic similarity scores column.
</ResponseField>
