> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Cortex Finetune

export const ComponentMetadata = ({warehouses, unsupportedWarehouses = [], componentType, connectionInputs, connectionOutputs}) => {
  const allWarehouses = [...warehouses.map(w => ({
    name: w,
    supported: true
  })), ...unsupportedWarehouses.map(w => ({
    name: w,
    supported: false
  }))];
  return <div style={{
    background: 'var(--colors-background-light, #f9fafb)',
    border: '1px solid var(--colors-border-default, #e5e7eb)',
    borderRadius: '12px',
    padding: '20px 28px',
    marginBottom: '28px',
    boxShadow: '0 1px 4px rgba(0,0,0,0.10)'
  }}>
      <table style={{
    width: '100%',
    borderCollapse: 'collapse'
  }}>
        <tbody>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle',
    width: '180px'
  }}>Project Availability</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>
              <div style={{
    display: 'flex',
    flexWrap: 'wrap',
    gap: '8px'
  }}>
                {allWarehouses.map((w, i) => <span key={i} style={{
    background: w.supported ? '#dcfce7' : '#fee2e2',
    color: w.supported ? '#15803d' : '#b91c1c',
    border: `1px solid ${w.supported ? '#bbf7d0' : '#fca5a5'}`,
    borderRadius: '9999px',
    padding: '3px 12px',
    fontSize: '0.85rem',
    fontWeight: '500',
    whiteSpace: 'nowrap'
  }}>
                    {w.name} {w.supported ? '✅' : '❌'}
                  </span>)}
              </div>
            </td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Component Type</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{componentType}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Inputs</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{connectionInputs}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Outputs</td>
            <td style={{
    verticalAlign: 'middle'
  }}>{connectionOutputs}</td>
          </tr>
        </tbody>
      </table>
    </div>;
};

<Info>
  Production use of this feature is available for specific editions only. [Contact our sales team](https://www.matillion.com/contact) for more information.
</Info>

<ComponentMetadata warehouses={["Snowflake"]} unsupportedWarehouses={["Databricks", "Amazon Redshift", "BigQuery"]} componentType="Orchestration" connectionInputs="One" connectionOutputs="Unlimited" />

The [Cortex Finetune](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-finetuning) orchestration component lets you fine-tune large language models (LLMs) using Snowflake Cortex. With this component, you can adapt powerful pre-trained LLMs to your organization's specific use case—whether it's customer support, summarization, content generation, or domain-specific reasoning—without needing to train models from scratch.

Fine-tuning a model on your own labeled dataset allows for more accurate, reliable responses aligned with your data and requirements. The resulting model can be called using your assigned name via the `CORTEX.COMPLETE` function in Snowflake.

To use this component, you must use a Snowflake role that has been granted the [SNOWFLAKE.CORTEX\_USER database role](https://docs.snowflake.com/en/sql-reference/snowflake-db-roles#label-snowflake-db-roles-cortex-schema). Read [Required Privileges](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#label-cortex-llm-privileges) to learn more about granting this privilege.

To learn more about Snowflake Cortex, such as availability, usage quotas, managing costs, and more, read [Large Language Model (LLM) Functions (Snowflake Cortex)](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions).

### Use case

The versatility of this component means it can be used in many ways. For example, you can use it to:

* Fine-tune a model using technical support transcripts to generate responses to new support tickets.
* Fine-tune a model using internal financial data and report templates to automatically draft summaries and analysis documents.

***

## Properties

Reference material is provided below for the Model, Training Data, and Validation Data properties.

<ResponseField name="Name" type="string" required>
  A human-readable name for the component.
</ResponseField>

### Model

<ResponseField name="Name" type="string" required>
  This name is used to reference the fine-tuned model in downstream Cortex functions.
</ResponseField>

{/* <!-- param-start:[model.baseModel] | warehouses: [snowflake] --> */}

<ResponseField name="Base Model" type="drop-down" required>
  Select the base LLM that will be fine-tuned using your training data. Available models include:

  * **llama3-8b:** Optimized for text classification, summarization, and sentiment analysis.
  * **llama3-70b:** High-performance model ideal for chat, content creation, and enterprise use.
  * **llama3.1-8b:** Lightweight, fast model with a 24K context window for moderate tasks.
  * **llama3.1-70b:** Cost-effective, open-source model for advanced enterprise applications.
  * **mistral-7b:** Fast, efficient model for summarization and simple question answering.
  * **mistral-8x7b:** Versatile model for generation, classification, and QA with low latency.

  Choosing a smaller model (such as `llama3-8b`) is recommended for faster training and cost-efficiency in smaller-scale or experimental use cases.
</ResponseField>

{/* <!-- param-start:[model.database] | warehouses: [snowflake] --> */}

<ResponseField name="Database" type="drop-down" required>
  Select the Snowflake database where your training and (optionally) validation tables are stored.
</ResponseField>

{/* <!-- param-start:[model.schema] | warehouses: [snowflake] --> */}

<ResponseField name="Schema" type="drop-down" required>
  Select the schema within the chosen database that contains your input tables.
</ResponseField>

{/* <!-- param-start:[model.creationMode] | warehouses: [snowflake] --> */}

<ResponseField name="Creation Mode" type="drop-down" required>
  Defines how the component executes the model creation process.

  * **Synchronous:** The component runs in a blocking manner and waits for the training job to complete.
  * **Asynchronous:** Initiates the training job and allows the pipeline to continue running other components without waiting for completion.

  <Note>
    - For asynchronous mode, you can find the job ID in the task history table to track job progress.
    - For synchronous mode, the job ID, job status, and model name are recorded in the pipeline logs.
  </Note>
</ResponseField>

### Training Data

You must supply a labeled dataset that contains pairs of prompts and expected completions.

<ResponseField name="Table" type="drop-down" required>
  Select the table that contains your training data.
</ResponseField>

{/* <!-- param-start:[source.promptColumn] | warehouses: [snowflake] --> */}

<ResponseField name="Prompt Column" type="drop-down" required>
  Choose the column containing the user-provided prompts.
</ResponseField>

{/* <!-- param-start:[source.completionColumn] | warehouses: [snowflake] --> */}

<ResponseField name="Completions Column" type="drop-down" required>
  Choose the column containing the expected or target responses for those prompts.
</ResponseField>

### Validation Data

To assess model performance, you can either auto-split the training data or provide a separate validation dataset.

<ResponseField name="Automatically Split Training Data" type="boolean" required>
  * **Yes:** A portion of the training table is automatically used as validation data. You do not need to provide a separate validation table.
  * **No:** You must supply a separate validation table and define corresponding columns.
</ResponseField>

{/* <!-- param-start:[validationSource.table] | warehouses: [snowflake] --> */}

<ResponseField name="Table" type="drop-down" required>
  Select the table that contains your validation data. Only available if `Automatically Split Training Data` is set to `No`.
</ResponseField>

{/* <!-- param-start:[validationSource.promptColumn] | warehouses: [snowflake] --> */}

<ResponseField name="Prompt Column" type="drop-down" required>
  Select the prompt column in the validation dataset. Only available if `Automatically Split Training Data` is set to `No`.
</ResponseField>

{/* <!-- param-start:[validationSource.completionColumn] | warehouses: [snowflake] --> */}

<ResponseField name="Completions Column" type="drop-down" required>
  Select the column with expected completions in the validation dataset. Only available if `Automatically Split Training Data` is set to `No`.
</ResponseField>

{/* <!-- param-start:[validationSource.epochs] | warehouses: [snowflake] --> */}

<ResponseField name="Epochs" type="integer" required>
  Specify the number of epochs—that is, the number of times the model should pass through the full training dataset during fine-tuning. A higher number of epochs may improve accuracy but may also increase training time and cost.
</ResponseField>

Once fine-tuning completes successfully, the model will be available for use with the assigned name through the `CORTEX.COMPLETE` function in Snowflake SQL and available via {maia}'s [Cortex Completions](/docs/components/cortex-completions) component. The model is registered within your Snowflake account under the selected database and schema.
