Window Calculation - Maia Documentation

The Window Calculation transformation component lets you use SQL window functions to analyze a subset of the input datastream. Window functions operate across a “window” of rows relative to the current row, without collapsing them (unlike aggregation). A window function operates on a group of related rows known as a partition. A partition is usually a logical grouping of rows. You specify how the upper and lower bounds of the partition are determined, expressed as an offset from the current row. Window function results are calculated for each row within each partition, with the calculation taking into account all the rows within the specified offset from the current row. This is equivalent to an SQL function using the OVER and PARTITION BY clauses. The available window functions depend on your cloud data warehouse. For more information, read the following:

The full list of supported functions is given under the Function property, below.

Window functions share some similarities with aggregate functions, and for some use cases you may find that the Aggregate component will serve your needs better. Bear the following in mind when evaluating which component best suits your use case:

For an aggregate function, the input is a group of rows from the dataset, and the output is one row (so the aggregated group is collapsed into a single row).
For a window function, the input is every row within the dataset, and the output is one row per input row.

For example, the SUM aggregate function returns a single total for all of the input rows, whereas the SUM window function returns one total for each input row, calculated from all the rows in the partition.

Use case

Window functions enable a range of advanced analytics such as ranking, time-series processing, and data comparison. Some typical uses for window functions include:

Calculating cumulative totals across a sequence. For example, cumulative sales per customer. This would use the SUM window function.
Calculating a moving average across a set of data. For example, to smooth out fluctuations in time-series data. This would use the Average window function.
Identifying the first or last value in a set of ordered rows. This would use the First Value or Last Value window function.

Properties

Snowflake
Databricks
Amazon Redshift
Google BigQuery

Name

string

required

A human-readable name for the component.

Include input columns

boolean

required

Defines whether the component passes all input columns into the output. The default is Yes.

Partition data

dual listbox

Select the columns that will define how the input data is partitioned. The window calculation will be performed on each partition.

Ordering within partitions

column editor

Select the columns that will be used to sort the partitioned data. For each column, select the sort order:

Ascending
Descending
Nulls First (sort null values first)
Nulls Last (sort null values last)

You can select multiple columns to create a complex sort. You can drag the selected columns to reorder the sort level if required.

Functions

column editor

required

Select a function to be performed on the rows contained in the window. Refer to the Snowflake Window Function documentation for full details.Supported window functions:

Any Value: Returns some value of the expression from the group. For full details, read the Snowflake documentation.
Approximate Count Distinct: Uses HyperLogLog to return an approximation of the distinct cardinality of the input. For full details, read the Snowflake documentation.
Array Aggregate Distinct: Returns the input values, pivoted into an array. For full details, read the Snowflake documentation.
Average: Returns the average (arithmetic mean) of the input column values in the window. For full details, read the Snowflake documentation.
Bit AND Aggregate: Returns the bitwise AND value of all non-Null numeric records in a group. For full details, read the Snowflake documentation.
Bit OR Aggregate: Returns the bitwise OR value of all non-Null numeric records in a group. For full details, read the Snowflake documentation.
Bit XOR Aggregate: Returns the bitwise XOR value of all non-Null numeric records in a group. For full details, read the Snowflake documentation.
Conditional Change Event: Returns a window event number for each row where the value of an argument is different from the value of the argument in the previous row. For full details, read the Snowflake documentation.
Conditional True Event: Returns a window event number for each row within a window partition based on the result of a boolean argument. For full details, read the Snowflake documentation.
Count: Returns a count of the non-Null values for the specified field. For full details, read the Snowflake documentation.
First Value: Given an ordered set of rows, returns the specified column value with respect to the first row in the window frame. For full details, read the Snowflake documentation.
Hash Aggregate: Returns an aggregate signed 64-bit hash value over the (unordered) set of input rows. For full details, read the Snowflake documentation.
Kurtosis: Returns the population excess kurtosis of non-Null records. For full details, read the Snowflake documentation.
Last Value: Given an ordered set of rows, returns the specified column value with respect to the last row in the window frame. For full details, read the Snowflake documentation.
List Aggregate: Returns the concatenated input values, separated by a delimiter string. For full details, read the Snowflake documentation.
List Aggregate Distinct: Returns the concatenated input values, separated by a delimiter string. Duplicate values are eliminated before concatenating. For full details, read the Snowflake documentation.
Maximum: Returns the maximum of the input expression values. The MAX function works with numeric values and ignores Null values. For full details, read the Snowflake documentation.
Median: Calculates the median value for the range of values in a window or partition. Null values in the range are ignored. For full details, read the Snowflake documentation.
Minimum: Returns the minimum of the input expression values. The MIN function works with numeric values and ignores Null values. For full details, read the Snowflake documentation.
Population Variance: Returns the population variance of a set of numeric columns. For full details, read the Snowflake documentation.
Sample Variance: Returns the sample variance of a set of numeric columns. For full details, read the Snowflake documentation.
Standard Deviation: Returns the standard deviation of a set of numeric values. For full details, read the Snowflake documentation.
Standard Deviation Population: Returns the population standard deviation of a set of numeric values. For full details, read the Snowflake documentation.
Sum: Returns the sum of the input column in the window. For full details, read the Snowflake documentation.

Multiple functions can be selected. For each function, select the Input Column that the function will act on, and the Output Column that the result will be written to.

Lower bound

drop-down

required

This property is visible after Ordering within partitions is set. Select which row of the partition the window calculation will start on. Options are:

unbounded preceding: The window starts at the first row of the partition.
current row: The window starts at the current row.
offset preceding: The window starts a number of rows (offset) before the current row. This requires you to set the Lower bound offset property.

Upper bound

drop-down

required

This property is visible after Ordering within partitions is set. Select which row of the partition the window calculation will end on. Options are:

unbounded following: The window ends at the last row of the partition.
current row: The window ends at the current row.
offset following: The window ends a number of rows (offset) after the current row. This requires you to set the Upper bound offset property.

Lower bound offset

integer

required

If the Lower bound property is set to offset preceding, enter the number of rows before the current row that the window will start on.

Upper bound offset

integer

required

If the Upper bound property is set to offset following, enter the number of rows after the current row that the window will end on.

Name

string

required

A human-readable name for the component.

Include input columns

boolean

required

Defines whether the component passes all input columns into the output. The default is Yes.

Partition data

dual listbox

Select the columns that will define how the input data is partitioned. The window calculation will be performed on each partition.

Ordering within partitions

column editor

Select the columns that will be used to sort the partitioned data. For each column, select the sort order:

Ascending
Descending
Nulls First (sort null values first)
Nulls Last (sort null values last)

Functions

column editor

required

Select a function to be performed on the rows contained in the window. Refer to the Databricks window functions documentation for full details.Supported window functions:

Average: Returns the average (arithmetic mean) of the input column values in the window. For full details, read the Databricks documentation.
Count: Returns a count of the non-Null values for the specified field. For full details, read the Databricks documentation.
First Value: Given an ordered set of rows, returns the specified column value with respect to the first row in the window frame. For full details, read the Databricks documentation.
Last Value: Given an ordered set of rows, returns the specified column value with respect to the last row in the window frame. For full details, read the Databricks documentation.
Maximum: Returns the maximum of the input expression values. The MAX function works with numeric values and ignores Null values. For full details, read the Databricks documentation.
Minimum: Returns the minimum of the input expression values. The MIN function works with numeric values and ignores Null values. For full details, read the Databricks documentation.
Population Variance: Returns the population variance of a set of numeric columns. For full details, read the Databricks documentation.
Sample Variance: Returns the sample variance of a set of numeric columns. For full details, read the Databricks documentation.
Standard Deviation: Returns the standard deviation of a set of numeric values. For full details, read the Databricks documentation.
Standard Deviation Population: Returns the population standard deviation of a set of numeric values. For full details, read the Databricks documentation.
Sum: Returns the sum of the input column in the window. For full details, read the Databricks documentation.

Multiple functions can be selected. For each function, select the Input Column that the function will act on, and the Output Column that the result will be written to.

Lower bound

drop-down

required

This property is visible after Ordering within partitions is set. Select which row of the partition the window calculation will start on. Options are:

unbounded preceding: The window starts at the first row of the partition.
current row: The window starts at the current row.
offset preceding: The window starts a number of rows (offset) before the current row. This requires you to set the Lower bound offset property.

Upper bound

drop-down

required

This property is visible after Ordering within partitions is set. Select which row of the partition the window calculation will end on. Options are:

unbounded following: The window ends at the last row of the partition.
current row: The window ends at the current row.
offset following: The window ends a number of rows (offset) after the current row. This requires you to set the Upper bound offset property.

Lower bound offset

integer

required

If the Lower bound property is set to offset preceding, enter the number of rows before the current row that the window will start on.

Upper bound offset

integer

required

If the Upper bound property is set to offset following, enter the number of rows after the current row that the window will end on.

Name

string

required

A human-readable name for the component.

Include input columns

boolean

required

Defines whether the component passes all input columns into the output. The default is Yes.

Partition data

dual listbox

Select the columns that will define how the input data is partitioned. The window calculation will be performed on each partition.

Ordering within partitions

column editor

Select the columns that will be used to sort the partitioned data. For each column, select the sort order:

Ascending
Descending
Nulls First (sort null values first)
Nulls Last (sort null values last)

Functions

column editor

required

Select a function to be performed on the rows contained in the window. Refer to the Amazon Redshift Window Function documentation for full details.Supported window functions:

Average: Returns the average (arithmetic mean) of the input column values in the window. For full details, read the Amazon Redshift documentation.
Count: Returns a count of the non-Null values for the specified field. For full details, read the Amazon Redshift documentation.
First Value: Given an ordered set of rows, returns the specified column value with respect to the first row in the window frame. For full details, read the Amazon Redshift documentation.
Last Value: Given an ordered set of rows, returns the specified column value with respect to the last row in the window frame. For full details, read the Amazon Redshift documentation.
List Aggregate: Returns the concatenated input values, separated by a delimiter string. Redshift doesn’t support ordering within partitions for this function, so that option won’t be applied to the results if selected. For full details, read the Amazon Redshift documentation.
Maximum: Returns the maximum of the input expression values. The MAX function works with numeric values and ignores Null values. For full details, read the Amazon Redshift documentation.
Median: Calculates the median value for the range of values in a window or partition. Null values in the range are ignored. Redshift doesn’t support ordering within partitions for this function, so that option won’t be applied to the results if selected. For full details, read the Amazon Redshift documentation.
Minimum: Returns the minimum of the input expression values. The MIN function works with numeric values and ignores Null values. For full details, read the Amazon Redshift documentation.
Population Variance: Returns the population variance of a set of numeric columns. For full details, read the Amazon Redshift documentation.
Sample Variance: Returns the sample variance of a set of numeric columns. For full details, read the Amazon Redshift documentation.
Standard Deviation: Returns the standard deviation of a set of numeric values. For full details, read the Amazon Redshift documentation.
Standard Deviation Population: Returns the population standard deviation of a set of numeric values. For full details, read the Amazon Redshift documentation.
Sum: Returns the sum of the input column in the window. For full details, read the Amazon Redshift documentation.

Multiple functions can be selected. For each function, select the Input Column that the function will act on, and the Output Column that the result will be written to.

Lower bound

drop-down

required

This property is visible after Ordering within partitions is set. Select which row of the partition the window calculation will start on. Options are:

unbounded preceding: The window starts at the first row of the partition.
current row: The window starts at the current row.
offset preceding: The window starts a number of rows (offset) before the current row. This requires you to set the Lower bound offset property.

Upper bound

drop-down

required

This property is visible after Ordering within partitions is set. Select which row of the partition the window calculation will end on. Options are:

unbounded following: The window ends at the last row of the partition.
current row: The window ends at the current row.
offset following: The window ends a number of rows (offset) after the current row. This requires you to set the Upper bound offset property.

Lower bound offset

integer

required

If the Lower bound property is set to offset preceding, enter the number of rows before the current row that the window will start on.

Upper bound offset

integer

required

If the Upper bound property is set to offset following, enter the number of rows after the current row that the window will end on.

Name

string

required

A human-readable name for the component.

Include input columns

boolean

required

Defines whether the component passes all input columns into the output. The default is Yes.

Partition data

dual listbox

Select the columns that will define how the input data is partitioned. The window calculation will be performed on each partition.

Ordering within partitions

column editor

Select the columns that will be used to sort the partitioned data. For each column, select the sort order:

Ascending
Descending

Functions

column editor

required

Select a function to be performed on the rows contained in the window. Refer to the Google BigQuery analytic functions documentation for full details.Supported window functions:

Average: Returns the average (arithmetic mean) of the input column values in the window. For full details, read the Google BigQuery documentation.
Count: Returns a count of the non-Null values for the specified field. For full details, read the Google BigQuery documentation.
First Value: Given an ordered set of rows, returns the specified column value with respect to the first row in the window frame. For full details, read the Google BigQuery documentation.
Last Value: Given an ordered set of rows, returns the specified column value with respect to the last row in the window frame. For full details, read the Google BigQuery documentation.
Maximum: Returns the maximum of the input expression values. The MAX function works with numeric values and ignores Null values. For full details, read the Google BigQuery documentation.
Minimum: Returns the minimum of the input expression values. The MIN function works with numeric values and ignores Null values. For full details, read the Google BigQuery documentation.
Population Variance: Returns the population variance of a set of numeric columns. For full details, read the Google BigQuery documentation.
Sample Variance: Returns the sample variance of a set of numeric columns. For full details, read the Google BigQuery documentation.
Standard Deviation: Returns the standard deviation of a set of numeric values. For full details, read the Google BigQuery documentation.
Standard Deviation Population: Returns the population standard deviation of a set of numeric values. For full details, read the Google BigQuery documentation.
Sum: Returns the sum of the input column in the window. For full details, read the Google BigQuery documentation.

Multiple functions can be selected. For each function, select the Input Column that the function will act on, and the Output Column that the result will be written to.

Lower bound

drop-down

required

This property is visible after Ordering within partitions is set. Select which row of the partition the window calculation will start on. Options are:

unbounded preceding: The window starts at the first row of the partition.
current row: The window starts at the current row.
offset preceding: The window starts a number of rows (offset) before the current row. This requires you to set the Lower bound offset property.

Upper bound

drop-down

required

This property is visible after Ordering within partitions is set. Select which row of the partition the window calculation will end on. Options are:

unbounded following: The window ends at the last row of the partition.
current row: The window ends at the current row.
offset following: The window ends a number of rows (offset) after the current row. This requires you to set the Upper bound offset property.

Lower bound offset

integer

required

If the Lower bound property is set to offset preceding, enter the number of rows before the current row that the window will start on.

Upper bound offset

integer

required

If the Upper bound property is set to offset following, enter the number of rows after the current row that the window will end on.

​Use case

​Properties

Use case

Properties