You can use to build, orchestrate, and execute data pipelines in using natural language. are built for a range of tasks, including preparing data via transformation pipelines, coordinating multiple pipelines with orchestration capabilities, and even executing pipelines for you, all from the chat interface. Read below to learn how fit into your data engineering workflow.Documentation Index
Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
Use this file to discover all available pages before exploring further.
Transformation pipelines
Transformation pipelines are used to shape, clean, and prepare your data before it’s loaded into a target destination, such as a cloud data platform (Snowflake, Databricks, Amazon Redshift) or cloud storage location (Amazon S3, Azure Blob Storage, Google Cloud Storage). These pipelines can involve tasks like filtering records, joining datasets, adding calculated fields, and writing results to tables. Instead of manually placing each component onto the canvas and configuring them yourself, you can describe your data transformation objective and will get to work.Transformation example
The below prompt assumes that the objects (the table and various columns) exist in the cloud data platform that your environment is connected to.
regional_revenue.”
In response to a prompt like this, will:
- Create or update a transformation pipeline.
- Add components such as Table Input, Calculator, and Table Output.
- Connect the components together and arrange them on the canvas in the correct order.
- Configure each component based on your instructions.
Common use cases
You can use transformation pipelines to:- Extract data from source tables or views.
- Filter, join, and enrich datasets.
- Add calculated columns, derived metrics, or business logic.
- Perform basic data validation or cleansing.
- Output results to tables, files, or downstream pipelines.
You can iterate on your pipeline by giving follow-up prompts, such as “Add a margin column” or “Filter for US customers only.” As long as you’re in the same session, understand the current state and will update your pipeline.
Example prompts
- “Join orders with customers, and calculate average spend per customer.”
- “Filter out rows where region is null, then sort by revenue.”
- “Take the cleaned_sales table and write it to Snowflake.”
- “Create a margin column using (revenue - cost) / revenue.”
Orchestration pipelines
support the ability to design orchestration workflows using natural language instructions. Instead of configuring each component yourself, you can describe the workflow at a high level, and will build it by linking orchestration components together in the correct sequence on the canvas.Orchestration example
The below prompt assumes that the objects (the table and various columns) exist in the cloud data platform that your environment is connected to.
- Connectors (data sources, destinations)
- Actions (job triggers, API calls)
- DDL operations (table creation or updates)
- Control logic (conditional branches, loops)
- Iterators
- Scalar variables
Example orchestration prompts
- “Extract sales data from Salesforce, transform it using the clean_sales pipeline, then load it into Snowflake.”
- “Run the ingest_customers pipeline. If it succeeds, trigger transform_customers. If it fails, send a Slack alert.”
- “Create an orchestration that first creates the destination table, then runs extract_orders and transform_orders.”
- “Trigger the marketing_reporting pipeline, wait for it to finish, then run update_dashboards.”
What this workflow includes
For a request such as: “Create an orchestration pipeline that first creates the destination table, then runs extract_orders and transform_orders.” The orchestration pipeline might look like this:- Create Table: Creates a table named
SALES_ORDERSin your cloud data platform if it doesn’t already exist, with columns forORDER_ID,CUSTOMER_ID,ORDER_DATE,AMOUNT, andSTATUS. - Run Orchestration: Executes the
extract_orders.orch.yamlpipeline. - Run Transformation: Executes the
transform_orders.tran.yamlpipeline.
Multi-pipeline workflows
Many data projects span multiple pipelines. For example, you might have a pipeline that extracts raw data, transforms that data into meaningful business insights, and then delivers the transformed data to a downstream system such as a dashboard. With , you can plot these steps one by one using natural language instructions and watch as set up your pipelines to run in the correct order. Once your project has a working environment connection to your cloud data platform, can get to work understanding cross-pipeline dependencies and help you update or even expand your pipelines over time. use extensive knowledge of all components to add the correct components to the canvas for each step of the described task. You could use a multi-pipeline workflow to:- Automate daily reporting: Extract customer data, clean it, and then refresh downstream dashboards.
- Break up complex jobs: Run staging and transformation in separate pipelines, then chain them together.
- Standardize workflows: Build modular pipelines (such as for ingestion, transformation, and loading) and combine them into end-to-end flows.
After create a multi-pipeline orchestration, you can continue the conversation to modify or add logic, such as inserting alerts, branching conditions, or post-processing steps.
Commit and push changes
can now help you commit pipeline changes and push them to your branch in a single, streamlined flow. However, don’t yet support publishing versioned artifacts. When you request a commit (e.g., by typing “commit and push the branch”), will:- Detect the changes made to your pipeline.
- Generate a default commit message (which you can edit).
- Offer to commit and push the changes.
- Ask you to confirm the action before proceeding.
-
You’ll see options such as:
- Decline: Cancel the operation.
- Accept for session: Apply the action for all similar future tasks in this session.
- Accept: Confirm this commit and push action.
Artifact versions are automatically generated based on the commit ID.
Sampling data
can sample your data to improve the accuracy and reliability of the transformation pipelines they create. Sampling allows to see the structure and content of a table such as columns, data types, and sample values so they can make better decisions about how to build your pipeline. Unlike a full pipeline run, sampling is lightweight and happens only when needed. Here’s how sampling works:- will only attempt to sample data from tables that already exist in your cloud data warehouse.
-
If the table does not exist yet, because it’s created in an earlier part of your pipeline, will:
- Prompt you for permission to run the pipeline to materialize the table.
- Then, request permission again to sample the newly created data.
- If the table does exist (for example, it was loaded in a prior orchestration step), may automatically attempt to sample it as part of configuring your transformation components.
- When sampling is required, use the data sampling tool, which is visible to you in the transformation pipeline. You’ll see this in the interface when initiate a sample request.
- Detect column names and data types.
- Choose the right components and configure them correctly.
- Apply filters, joins, and transformations more accurately.
Sampling permissions
Sampling is always permission-based. may prompt you to allow sampling before proceeding, especially if they need to access newly created tables. can sample up to 200 rows of data when reviewing your data, identifying patterns, and troubleshooting transformation components. Sampled data is stored for 30 days. If you want to prevent from sampling data across your entire account, in your account details, disable the Enable sampling for Maia toggle.Running pipelines
can run your orchestration or transformation pipelines directly from the chat interface on your instruction, letting you test and validate your workflows without leaving the conversation. For example: “Run the daily_ingest pipeline.” Or as part of a larger task: “Create a new pipeline to clean customer data, then run it.”Permissions and control
To run a pipeline, request explicit permission via the Tool permissions dialog:- Accept once: Run once.
- Accept for Session: Allow runs throughout the session.
- Decline: Cancel the run.
Monitoring and feedback
Once a pipeline is running, monitor its progress and provide live updates in the chat:- Current run status (running, succeeded, failed).
- Output summaries (such as the number of rows processed).
- Highlighting of any component errors.
- “What was the result of the last run?”
- “Show me any failed components.”
- “Run it again with new parameters.”
