OpenAI Prompt - Maia Documentation

Production use of this feature is available for specific editions only. Contact our sales team for more information.

The OpenAI Prompt component uses a large language model (LLM) to provide responses to user-composed prompts. The component takes one or more inputs from your source table, combines the inputs with user prompts, and sends this data to the LLM for processing. You can configure the component to output the results as either a text value or a JSON object. The results will be stored—along with other metadata—in a destination table on your cloud data warehouse. We recommend that you read Best practices for prompt engineering with OpenAI API if you are new to prompt engineering.

Video example

Properties

Reference material is provided below for the Connect, Source, Configure, Destination, and RAG properties.

Name

string

required

A human-readable name for the component.

Connect

Model

drop-down

required

Specify the OpenAI model. Select [Custom] if you wish to specify a model not available in the drop-down list.

Custom Model Name

string

required

The name of your custom model. Only required if you selected [Custom] in Model. Use the full name—for example, gpt-3.5-turbo-instruct.

API Key

drop-down

required

Use the drop-down menu to select the corresponding secret definition that denotes the value of your OpenAI API key.Read Secrets and secret definitions to learn how to create a new secret definition.To create a new OpenAI API key:

Log in to OpenAI.
Click your avatar in the top-right of the UI.
Click View API keys.
Click + Create new secret key.
Give a name for your new secret key and click Create secret key.
Copy your new secret key and save it. Then click Done.

Temperature

floating point number

Set the input temperature. Accepts decimal values between 0 and 2.To quote OpenAI:

Lower values for temperature result in more consistent outputs, while higher values generate more diverse and creative results. Select a temperature value based on the desired trade-off between coherence and creativity for your specific application.

Top P

floating point number

An alternative method to sampling with “temperature” called “nucleus sampling”. The model “considers” the results of the tokens, with top_p probability mass. Therefore, 0.1 means that only tokens comprising the top 10% of probability mass would be considered.OpenAI recommends altering Top P or Temperature, but not both.

integer

You can choose to generate more than one response for your input data when the output is JSON. This parameter can quickly consume your token quota—use carefully. Default is 1.

Effort Level

drop-down

Select the effort level used for reasoning. This parameter is only visible when you select a reasoning model. Supported effort levels are low, medium, and high. The default setting is Medium.Setting this parameter to Low may lead to faster responses and fewer tokens used on reasoning in the response.

Max Tokens

integer

The maximum number of completion tokens per prompt request. Each model has its own maximum token allowance, and this must be considered.Visit the OpenAI tokenizer tool to learn more about language model tokenization.

Source

Select your cloud data warehouse.

Snowflake
Databricks
Amazon Redshift

Database

drop-down

required

The Snowflake source database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.

Schema

drop-down

required

The Snowflake source schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.

Table

drop-down

required

An existing Snowflake table to use as the input.

Catalog

drop-down

required

Select a source Databricks Unity Catalog. The special value [Environment Default] uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter.

Schema (Database)

drop-down

required

The Databricks source schema. The special value [Environment Default] uses the schema defined in the environment. Read Create and manage schemas to learn more.

Table

drop-down

required

An existing Databricks table to use as the input.

Schema

drop-down

required

The Amazon Redshift source schema. The special value [Environment Default] uses the schema defined in the environment. Read Schemas to learn more.

Table

drop-down

required

An existing Redshift table to use as the input.

Key Column

drop-down

required

Set a column as the primary key. Join the results back to the input table.

Limit

integer

required

Set a limit for the numbers of rows from the table to load. The default is 1000.

Configure

User Context

text editor

Provide your prompt. When Output Format is TEXT, this property is where you must specify all of the questions that you wish the LLM to provide answers to.Prompts should define the following information:

A persona. Who or what should the model impersonate when contextualizing their generative responses?
A context. Contextualize the situation for the model to enhance its responses.
A tone. What kind of language do you want the model to use?

Providing an example output may improve performance.

Inputs

column editor

required

Select the source columns to feed as input to the prompt component.

Column Name: A column from the input table.
Descriptive Name: An alternate descriptive name to better contextualize the column. Recommended if your column names are low-context.
Type: Choose Text or Image. Image is currently only supported for model GPT-4o.

Images should be in either Base64 format or a direct, public URL of the image. If you’re feeding images in Base64 format, you will need to add data:image/{format_of_your_image};base64, before the encoded image.For example if your image is .jpeg, the encoded image would look like this:data:image/jpeg;base64, iVBORw0KGgoAAAANSUhEUgAAAyAAAAPoCAIAAACpqQ3mAAAy7klEQVR4nO3deZgj...

Output Format

drop-down

required

Choose TEXT or JSON. Choosing JSON will activate an additional property, Outputs.

Outputs

column editor

required

JSON only.Define the output columns the prompt component will generate.

Output: Key of a key:value JSON pair. For example, an output might be “review_score”.
Context: Text that defines the output you expect the model to provide—that is, some task for the model to perform. For example, “Give a score between 0 and 10 on the level of satisfaction you feel in the user’s review where 0 is completely dissatisfied and 10 is extremely satisfied.” You may wish to use this parameter to configure the tone of the model (where applicable).

Destination

Select your cloud data warehouse.

Snowflake
Databricks
Amazon Redshift

Database

drop-down

required

The Snowflake destination database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.

Schema

drop-down

required

The Snowflake destination schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.

Table

string

required

The new Snowflake table to load your prompt output into. Will create a new table if one does not exist. Otherwise, will replace any existing table of the same name.

Catalog

drop-down

required

Select a destination Databricks Unity Catalog. The special value [Environment Default] uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter.

Schema (Database)

drop-down

required

The Databricks destination schema. The special value [Environment Default] uses the schema defined in the environment. Read Create and manage schemas to learn more.

Table

string

required

The new Databricks table to load your prompt output into. Will create a new table if one does not exist. Otherwise, will replace any existing table of the same name.

Schema

drop-down

required

The Amazon Redshift destination schema. The special value [Environment Default] uses the schema defined in the environment. Read Schemas to learn more.

Table

string

required

The new Redshift table to load your prompt output into. Will create a new table if one does not exist. Otherwise, will replace any existing table of the same name.

Create Table Options

required

Replace if Table Exists: The pipeline will run despite the table already existing. The table will be recreated.
Fail if Table Exists: If the table already exists, the pipeline will fail to run. This is the default setting.
Append: If the table already exists, any new rows (records) will be added to the table without modifying or deleting existing rows. If the table does not exist, it is created and any new rows are inserted.

Snowflake only:This component uses the CREATE OR REPLACE clause. When using the REPLACE clause, it also applies the COPY GRANTS clause. When you clone or create a new object (such as a table, view, schema, or database) from an existing one, the new object doesn’t automatically inherit the original’s grants (privileges). However, with the COPY GRANTS clause, you can seamlessly transfer object-level privileges from the source object to the new one. This helps maintain consistent access control and simplifies permission management when cloning or recreating objects. For more information, read Snowflake COPY GRANTS.

Column Prefix

string

Optionally set a prefix for newly created columns.

RAG

Enable RAG

boolean

required

Click Yes to enable Retrieval-Augmented Generation (RAG). Using RAG optimizes an LLM output by invoking an authoritative knowledge base outside of the LLM’s initial training data sources. By using RAG, you can extend an LLM’s capabilities to specific domains, such as your organization’s documentation, without needing to retrain the model.Defaults to No.

Pretext

text editor

required

Add text to your LLM prompt before the RAG data is listed, thus instructing the LLM what to do with the RAG data. For example, you might wish to use RAG to search relevant documentation snippets to answer a question.Example:“Use any of the following documentation snippets in your response, if relevant:”

Search Column

drop-down

required

Choose a column in the source table that contains a search term for the vector database. The value is then taken from that column and a vector search is performed. For example, a column value might be a user question such as “How do I log in?”. A search is then performed on the vector database using the value string, which will return N number of relevant data. N is defined by the Top K parameter, further down.If your vector database contained vectors created from chunks of text documentation, in this scenario the RAG data returned may include the chunk “to log in, click on the key button in the top right and enter your username and password”. This data is then inserted into the llm prompt to help provide relevant context.

Embedding Provider

drop-down

required

The embedding provider is the API service used to convert the search term into a vector. Choose either OpenAI or Amazon Bedrock. The embedding provider receives a search term (e.g. “How do I log in?”) and returns a vector.Choose your provider:

OpenAI
Amazon Bedrock

OpenAI API Key

drop-down

required

Log in to OpenAI.
Click your avatar in the top-right of the UI.
Click View API keys.
Click + Create new secret key.
Give a name for your new secret key and click Create secret key.
Copy your new secret key and save it. Then click Done.

Embedding Model

drop-down

required

Select an embedding model.Currently supports:

text-embedding-ada-002
text-embedding-3-small
text-embedding-3-large

Vector Database

drop-down

required

Select a vector database to use.Currently supports Pinecone and Postgres.

Pinecone
Postgres

Pinecone API Key

drop-down

required

Use the drop-down menu to select the corresponding secret definition that denotes the value of your Pinecone API key.Read Secrets and secret definitions to learn how to create a new secret definition.

Pinecone Index

string

required

The name of the Pinecone vector search index to connect to. To retrieve an index name:

Log in to Pinecone.
Click PROJECTS in the left sidebar.
Click a project tile. This action will open the list of vector search indexes in your project.

Pinecone Namespace

string

required

The name of the Pinecone namespace. Pinecone lets you partition records in an index into namespaces. To retrieve a namespace name:

Log in to Pinecone.
Click PROJECTS in the left sidebar.
Click a project tile. This action will open the list of vector search indexes in your project.
Click on your vector search index tile.
Click the NAMESPACES tab. Your namespaces will be listed.

Top K

integer

required

The number of results to return from the vector database query.Between 1-100.Default is 3.

Data Lookup Strategy

drop-down

required

Select the data lookup strategy. Pinecone only stores the vector associated with text data, and a JSON metadata blob. While the text data can be stored in the metadata blob, size limitations can affect coverage—for example when a user has a larger blob of text to be converted to a vector.

Raw data in metadata: Choosing this option adds an additional property, Data Path, to provide the path to text data within the metadata JSON blob.
Table details in metadata: Database, schema, and table information is used in the metadata to look up the text data in your warehouse table.

Data Path

string

required

Set the path to the data in the metadata JSON blob.Default is data.

Host

string

required

Your Postgres hostname.

Port

string

required

The TCP port number the Postgres server listens on. The default is 5432.

Database

string

required

The name of your Postgres database.

Username

string

required

Your Postgres username.

Password

drop-down

required

Use the drop-down menu to select the corresponding secret definition that denotes the value of your Postgres password.Read Secrets and secret definitions to learn how to create a new secret definition.

Schema

drop-down

required

The Postgres schema. The available schemas are determined by the Postgres database you have provided.

Table

drop-down

required

The table to load data from. The available tables are determined by the Postgres schema you have selected.

Key Column Name

drop-down

required

The column in your table to use as the key column.

Text Column Name

drop-down

required

The column in your table with your original text data.

Embedding Column Name

drop-down

required

The column in your table used to store your embeddings.

Similarity Function

drop-down

required

Select which similarity function (distance metrics) to use.

Top K

integer

required

The number of results to return from the vector database query.Between 1-100.Default is 3.

Connection Options

column editor

required

Parameter: A JDBC Postgres parameter supported by the database driver.
Value: A value for the given parameter.

Explanation of output

Output column	Description
DATA	A JSON set of key:value pairs when using the JSON.
PROMPT_TOKENS	Number of tokens in the generated completion.
COMPLETION_TOKENS	Number of tokens in the prompt.
ALL_TOKENS	Total number of tokens used in the request (prompt + completion).
RAW_DATA	All responses received from the LLM. If you have set the parameter N to a number greater than 1, then you will see multiple responses for each record.
ERROR_METADATA	Contains information about errors that occurred when processing that row, such as if no valid JSON was found, or the request to the LLM timed out. Useful for debugging results where no selected result, and/or no raw data is available in the output.
SYSTEM_METADATA	This is a fundamental instruction from ‘s component for the LLM, advising how it should process and respond to the information it receives. For example: “You are a data analysis and exploration tool. You receive a context prompt from the user, a data object, and an output format. Generate the output in a valid JSON format, including only the output variables without any headers or explanations. Escape any values that would not be valid in a JSON string, such as newlines and double quotes.”
USER_PROMPT	Prompt defined by the user in the Context property of the OpenAI prompt component.
MODEL_NAME	The specific name of the model. For example, `GPT-Turbo`.
PROVIDER_NAME	Name of the LLM provider.

​Video example

​Properties

​Connect

​Source

​Configure

​Destination

​RAG

​Explanation of output

Video example

Properties

Connect

Source

Configure

Destination

RAG

Explanation of output