> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Aggregate

export const ComponentMetadata = ({warehouses, unsupportedWarehouses = [], componentType, connectionInputs, connectionOutputs}) => {
  const allWarehouses = [...warehouses.map(w => ({
    name: w,
    supported: true
  })), ...unsupportedWarehouses.map(w => ({
    name: w,
    supported: false
  }))];
  return <div style={{
    background: 'var(--colors-background-light, #f9fafb)',
    border: '1px solid var(--colors-border-default, #e5e7eb)',
    borderRadius: '12px',
    padding: '20px 28px',
    marginBottom: '28px',
    boxShadow: '0 1px 4px rgba(0,0,0,0.10)'
  }}>
      <table style={{
    width: '100%',
    borderCollapse: 'collapse'
  }}>
        <tbody>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle',
    width: '180px'
  }}>Project Availability</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>
              <div style={{
    display: 'flex',
    flexWrap: 'wrap',
    gap: '8px'
  }}>
                {allWarehouses.map((w, i) => <span key={i} style={{
    background: w.supported ? '#dcfce7' : '#fee2e2',
    color: w.supported ? '#15803d' : '#b91c1c',
    border: `1px solid ${w.supported ? '#bbf7d0' : '#fca5a5'}`,
    borderRadius: '9999px',
    padding: '3px 12px',
    fontSize: '0.85rem',
    fontWeight: '500',
    whiteSpace: 'nowrap'
  }}>
                    {w.name} {w.supported ? '✅' : '❌'}
                  </span>)}
              </div>
            </td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Component Type</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{componentType}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Inputs</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{connectionInputs}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Outputs</td>
            <td style={{
    verticalAlign: 'middle'
  }}>{connectionOutputs}</td>
          </tr>
        </tbody>
      </table>
    </div>;
};

<ComponentMetadata warehouses={["Snowflake", "Databricks", "Amazon Redshift", "Google BigQuery"]} componentType="Transformation" connectionInputs="One" connectionOutputs="Unlimited" />

The **Aggregate** transformation component lets you summarize multiple rows of data in one output row by grouping or aggregating the input data. You can aggregate one or more columns at the same time.

By default, the aggregation is output in a new column with a name that combines the source column name and the aggregation type—for example, `avg_Cost`. You can use a [Rename](/docs/components/rename) component immediately after the **Aggregate** component to change these names if necessary.

### Use case

This component can be used to create useful metrics and KPIs. For example, you can use it to:

* Calculate a total. For example, use the `Sum` aggregation to calculate the total sales per product type from a dataset containing information about all purchases.
* Calculate averages. For example, use the `Average` aggregation to find the average transaction amount per customer account from a dataset containing all customer transactions.
* Determine minimum and maximum values. For example, use the `Max` aggregation type to find users' most recent logins from a dataset containing all logins in the past month.

***

## Properties

<Tabs>
  <Tab title="Snowflake">
    <ResponseField name="Name" type="string" required>
      A human-readable name for the component.
    </ResponseField>

    {/* <!-- param-start:[groupings] | warehouses: [snowflake] --> */}

    <ResponseField name="Groupings" type="dual listbox">
      Select one or more columns from the source table that will form the groupings. The output flow will contain one row for each distinct combination of values within the grouping columns.
    </ResponseField>

    {/* <!-- param-start:[aggregations] | warehouses: [snowflake] --> */}

    <ResponseField name="Aggregations" type="column editor" required>
      Select the input (source) columns and the aggregation types that should be applied to each. Note that certain aggregation types expect certain data types. For example, you cannot average non-numeric data.

      * **Column Name:** Select the input (source) column for the summary function.
      * **Aggregate Type:** The summary function to use for the column. Available functions include:
        * Any Value
        * Approximate Count
        * Approximate Count Distinct (HLL)
        * Array Aggregate
        * Array Aggregate Distinct
        * Average
        * Bit AND Aggregate
        * Bit OR Aggregate
        * Bit XOR Aggregate
        * Bool AND Aggregate
        * Bool OR Aggregate
        * Bool XOR Aggregate
        * Count
        * Count Distinct
        * Hash Aggregate
        * List Aggregate
        * List Aggregate Distinct
        * Max
        * Median
        * Min
        * Mode
        * Skew
        * Standard Deviation - Population
        * Standard Deviation - Sample
        * Sum
        * Variance - Population
        * Variance - Sample

      For more information on the **Aggregation Types**, read the [Snowflake Aggregate Functions documentation](https://docs.snowflake.com/en/sql-reference/functions-aggregation).
    </ResponseField>

    {/* <!-- param-start:[groupingType] | warehouses: [snowflake] --> */}

    <ResponseField name="Grouping type" type="drop-down" required>
      Select one of the following methods for grouping rows:

      * **GROUP\_BY:** Groups rows with the same group-by-item expressions and computes aggregate functions for the resulting group. A [Snowflake GROUP\_BY](https://docs.snowflake.com/en/sql-reference/constructs/group-by.html) expression can be a column name, a number referencing a position in the SELECT list, or a general expression.
      * **GROUP\_BY\_CUBE:** The [Snowflake GROUP\_BY\_CUBE](https://docs.snowflake.com/en/sql-reference/constructs/group-by-cube.html) clause is an extension of the GROUP\_BY clause similar to GROUP\_BY\_ROLLUP. In addition to producing all the rows of a GROUP\_BY\_ROLLUP, GROUP\_BY\_CUBE adds all the "cross-tabulations" rows. Subtotal rows are rows that further aggregate, whose values are derived by computing the same aggregate functions that were used to produce the grouped rows.
      * **GROUP\_BY\_GROUPING\_SETS:** A [Snowflake GROUP\_BY\_GROUPING\_SETS](https://docs.snowflake.com/en/sql-reference/constructs/group-by-grouping-sets.html) clause is a powerful extension of the GROUP\_BY clause that allows computing multiple group-by clauses in a single statement. The group set is a set of dimension columns.
      * **GROUP\_BY\_ROLLUP:** A [Snowflake GROUP\_BY\_ROLLUP](https://docs.snowflake.com/en/sql-reference/constructs/group-by-rollup.html) clause is an extension of the GROUP\_BY clause that produces subtotal rows (in addition to the grouped rows). Subtotal rows are rows that further aggregate, whose values are derived by computing the same aggregate functions that were used to produce the grouped rows.
    </ResponseField>
  </Tab>

  <Tab title="Databricks">
    <ResponseField name="Name" type="string" required>
      A human-readable name for the component.
    </ResponseField>

    {/* <!-- param-start:[groupings] | warehouses: [databricks] --> */}

    <ResponseField name="Groupings" type="dual listbox">
      Select one or more columns from the source table that will form the groupings. The output flow will contain one row for each distinct combination of values within the grouping columns.
    </ResponseField>

    {/* <!-- param-start:[aggregations] | warehouses: [databricks] --> */}

    <ResponseField name="Aggregations" type="column editor" required>
      Select the input (source) columns and the aggregation types that should be applied to each. Note that certain aggregation types expect certain data types. For example, you cannot average non-numeric data.

      * **Column Name:** Select the input (source) column for the summary function.
      * **Aggregate Type:** The summary function to use for the column. Available functions include:
        * Approximate Count
        * Average
        * Count
        * Count Distinct
        * Max
        * Min
        * Standard Deviation - Sample
        * Standard Deviation - Population
        * Sum
        * Variance - Sample
        * Variance - Population
        * Sum Distinct
        * Skewness
        * Kurtosis

      You can add the same source column multiple times, each with a different aggregate type. The summary is calculated for each combination of the source column values.

      For more information on each available function, read [this guide to SQL functions](https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html).
    </ResponseField>
  </Tab>

  <Tab title="Amazon Redshift">
    <ResponseField name="Name" type="string" required>
      A human-readable name for the component.
    </ResponseField>

    {/* <!-- param-start:[groupings] | warehouses: [redshift] --> */}

    <ResponseField name="Groupings" type="dual listbox">
      Select one or more items from the Groupings column to confirm the Selected Groupings.
    </ResponseField>

    {/* <!-- param-start:[aggregations] | warehouses: [redshift] --> */}

    <ResponseField name="Aggregations" type="column editor" required>
      Select the input (source) columns and the aggregation types that should be applied to each. Note that certain aggregation types expect certain data types. For example, you cannot average non-numeric data.

      * **Column Name:** Select the input (source) column for the summary function.
      * **Aggregate Type:** The summary function to use for the column. Available SQL aggregate functions include:
        * Approximate Count: This is a variant of Count Distinct that may have better performance in cases where an exact count is not required. For more information, read the [Redshift documentation](https://docs.aws.amazon.com/redshift/latest/dg/r_COUNT.html).
        * Average: Average the values in the column name. This is only sensible for numeric column names.
        * Count: Count the total number of records whose column name is not null.
        * Count Distinct: Count the total number of distinct (unique) values of the column name.
        * Max: Find the Maximum value of the column name.
        * Median: Calculates the median value for the range of values. For more information, read the [Redshift Median documentation](https://docs.aws.amazon.com/redshift/latest/dg/r_MEDIAN.html).
        * Min: Find the Minimum value of the column name.
        * Standard Deviation - Population: Calculates the population standard deviation of the column name. This is only sensible for numeric columns names.
        * Standard Deviation - Sample: Calculates the sample standard deviation of the column name. This is only sensible for numeric columns names.
        * Sum: Sum the values in the column name. This is only sensible for numeric columns names.
        * Variance - Population: Calculates the population variance of the column name. This is only sensible for numeric columns names.
        * Variance - Sample: Calculates the sample variance of the column name. This is only sensible for numeric columns names.

      For more information on the **Aggregation Types**, read the [Redshift Aggregate Functions documentation](https://docs.aws.amazon.com/redshift/latest/dg/c_Aggregate_Functions.html).
    </ResponseField>
  </Tab>

  <Tab title="Google BigQuery">
    <ResponseField name="Name" type="string" required>
      A human-readable name for the component.
    </ResponseField>

    {/* <!-- param-start:[groupings] | warehouses: [bigquery] --> */}

    <ResponseField name="Groupings" type="dual listbox">
      Select one or more columns from the source table that will form the groupings. The output flow will contain one row for each distinct combination of values within the grouping columns.
    </ResponseField>

    {/* <!-- param-start:[aggregations] | warehouses: [bigquery] --> */}

    <ResponseField name="Aggregations" type="column editor" required>
      Select the input (source) columns and the aggregation types that should be applied to each. Note that certain aggregation types expect certain data types. For example, you cannot average non-numeric data.

      * **Column Name:** Select the input (source) column for the summary function.
      * **Aggregate Type:** The summary function to use for the column. Available functions include:
        * Approximate Count
        * Array Aggregate
        * Average
        * Count
        * Count Distinct
        * Max
        * Min
        * Standard Deviation - Population
        * Standard Deviation - Sample
        * Sum
        * Variance - Population
        * Variance - Sample

      For more information on the **Aggregation Types**, read the [BigQuery Aggregate Functions documentation](https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions).

      Click the **Text mode** toggle at the bottom of the dialog to open a multi-line editor that lets you add items in a single block. For more information, read [Text mode](/docs/guides/components-overview#text-mode).

      To use grid variables, toggle **Use Grid Variable** on at the bottom of the dialog. For more information, read [Grid variables](/docs/guides/grid-variables).
    </ResponseField>
  </Tab>
</Tabs>

## Examples

Alice and Zoe are managing a merchandise booth on different days. We have some data for their sales on each day.

```
Input data

+------------+----------+----------+---------+
|    DATE    | PRODUCT  | QUANTITY | MANAGER |
+------------+----------+----------+---------+
| 2023-04-01 | Mugs     |       10 | Alice   |
| 2023-04-01 | Stickers |       20 | Alice   |
| 2023-04-02 | Mugs     |       15 | Alice   |
| 2023-04-02 | Stickers |        5 | Alice   |
| 2023-04-03 | Mugs     |       20 | Zoe     |
| 2023-04-03 | Stickers |       25 | Zoe     |
| 2023-04-04 | Mugs     |       15 | Zoe     |
| 2023-04-04 | Stickers |        5 | Zoe     |
+------------+----------+----------+---------+

```

<Tabs>
  <Tab title="Example 1">
    **Find the total number of each product sold.**

    One of the simplest and most useful things to do would be to aggregate the sales numbers, grouped by product.

    **Aggregate component properties:**

    * **Groupings:** PRODUCT
    * **Aggregations:**
      * Column Name: QUANTITY
      * Aggregation Type: Sum
    * **Grouping Type:** Group By

    ```
    Output data

    +--------------+----------+
    | sum_Quantity | Product  |
    +--------------+----------+
    |           60 | Mugs     |
    |           55 | Stickers |
    +--------------+----------+
    ```
  </Tab>

  <Tab title="Example 2">
    **Find the average sales by each manager, for each product.**

    We can add multiple groupings and each becomes an output column from the Aggregate component. We'll be using an Average aggregation type, this time.

    **Aggregate component properties:**

    * **Groupings:**
      * PRODUCT
      * MANAGER
    * **Aggregations:**
      * Column Name: QUANTITY
      * Aggregation Type: Average
    * **Grouping Type:** Group By

    ```
    Output data

    +--------------+---------+----------+
    | avg_Quantity | Manager | Product  |
    +--------------+---------+----------+
    |    12.500000 | Alice   | Mugs     |
    |    12.500000 | Alice   | Stickers |
    |    17.500000 | Zoe     | Mugs     |
    |    15.000000 | Zoe     | Stickers |
    +--------------+---------+----------+
    ```

    Note that because we've calculated an average, the avg\_Quantity data now shows as a real type rather than an integer.
  </Tab>
</Tabs>
