> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Pipelines tutorial

export const designer = "Designer";

In this tutorial, we'll look at a simple use case where we have some data sets from our social media platform, which we want to load into a data warehouse and transform. The first data set is data about social media posts, stored in an Excel sheet. The second is data about social media users, stored in a Google Sheet.

First, we'll load our data into the data warehouse. Then, we'll transform the data by converting the data types of some columns in our data. Next, we'll join the two datasets so we can see all our data together: the post title, who wrote it, the country they are from, and how many likes that post received.

We'll then perform some further transformations—first, an aggregation to calculate the sum total of likes for each unique country. Then, we'll rank these countries by their number of likes from most to fewest.

***

## Get started

There are two pipeline types:

* **Orchestration** pipelines are used to extract and load data.
* **Transformation** pipelines are used to transform data within the warehouse.

***

## Create an orchestration pipeline

We will first use an orchestration pipeline to load our data, starting on the {designer} canvas. For more information about the {designer} canvas, read [{designer} UI basics](/docs/guides/designer-ui-basics).

We are going to create a folder called `SocialMediaData`, containing an orchestration pipeline called `LoadSocialMediaData`.

1. Click **Files** to open the files panel.
2. Click the **+** icon, then click **Folder** to add a new folder. Name your new folder `SocialMediaData`.
3. Click the three dots **...** next to this folder, then **Add**, then select **Orchestration pipeline**.
4. In the **Pipeline name** field, name this pipeline `LoadSocialMediaData`.
5. In the **Description** field, you can enter a description for the pipeline, or leave this field empty.

The pipeline will open on the canvas.

***

## Load posts data using Excel Query

In this step, we will use the [Excel Query](/docs/components/excel-query) component to load data from an Excel sheet that is stored in an S3 bucket.

### Configure the component

1. The [Start](/docs/components/start) component will appear on the canvas. This is the starting point for your pipeline flow.
2. Click the **+** icon to the right of the **Start** component to open the **Add component** dialog.
3. Search for and click the **Excel Query** component, which we will use to load social media data from an Excel sheet in an S3 bucket. This component is automatically connected to the **Start** component on the canvas, and the **Configuration** tab of the component properties panel opens on the right of the canvas.
4. In the **Connect** section of the component properties panel, adjust the following settings:
   * In **Storage Type**, keep the default selection of **Amazon S3 Storage**.
   * Click the **Storage URL** property, then in **S3 Buckets path** enter `matillion-docs/Attachments/academy/socialmediadata.xlsx` and click **Save**.
5. In the **Configure** section of the component properties panel, adjust the following settings:
   * In **Data Source**, select the **posts** worksheet.
   * In **Data Selection**, click the **>>** icon to load all columns from this worksheet, then click **Save**.
6. In the **Destination** section of the component properties panel, adjust the following settings:
   * In **Target table**, name the table `posts`.
   * In **Stage Platform**, keep the default selection of **Snowflake Managed**.

### Validate the pipeline and sample your data

1. Make sure the pipeline configuration is valid by clicking **Validate** on the canvas.
2. Sample your data by clicking the **Sample data** tab at the bottom of the canvas, then click **Sample data**. You will see the first 25 rows of data loaded using this component.

***

## Load user data using Google Sheets Query

In this step, we will use the [Google Sheets Query](/docs/components/google-sheets-query) component to load some test social media data that is stored in a Google Sheet. To do this, copy the Google Sheet below to your own Google account. You may want to bookmark or save this Google Sheet as a favorite, so you can access it easily.

[https://docs.google.com/spreadsheets/d/1VYV8-MbMko12pWKeGJGhSu5zpkDDb51Hx3CIrqirTzk/edit?usp=sharing](https://docs.google.com/spreadsheets/d/1VYV8-MbMko12pWKeGJGhSu5zpkDDb51Hx3CIrqirTzk/edit?usp=sharing)

1. In your orchestration pipeline, click the **+** icon to the right of the **Start** component to add another component.
2. In the **Add component** dialog, search for and click the **Google Sheets Query** component.

### Authenticate the component

1. Click the **Authentication** property, then click **Manage**. This will take you to a new tab containing a list of all the OAuth connections in your project.
2. Click **Add OAuth connection**.
3. In **OAuth name**, enter `google-sheets`.
4. In **Provider**, select **Google**.
5. In **Authentication Type**, select **OAuth 2.0 Authorization Code Grant**.
6. Click **Sign in with Google**.
7. Select the Google account associated with the sheet you need to access, then click **Allow**.

You will be taken back to the OAuth connections list, where your new `google-sheets` OAuth connection will appear.

### Configure the component

1. Return to {designer} and reopen your `LoadSocialMediaData` orchestration pipeline.
2. Click the **Google Sheets** component.
3. Click the **Authentication** property, then in **Choose your profile**, select your new **google-sheets** OAuth.
4. Click the **\<** icon to return to the **Configuration** tab of the component properties panel.
5. In the **Configure** section of the component properties panel, adjust the following settings:
   * In **Spreadsheet Name**, enter `SocialMediaData`.
   * In **Data Source**, select **SocialMediaData\_users**.
   * In **Data Selection**, click the **>>** icon to load all columns from this sheet, then click **Save**.
   * In **Row limit**, increase the limit to 1000 rows of data.
6. In the **Destination** section of the component properties panel, adjust the following settings:
   * In **Warehouse**, **Database** and **Schema**, select **\[Environment default]**.
   * In **Target Table**, name the table `users`.
   * In **Stage Platform**, select **Snowflake Managed**.
7. Validate the pipeline and sample your data as described for the **Excel Query** component.

***

## Run the orchestration pipeline

Click **Run** on the canvas.

This orchestration pipeline will now run. It will load the data from the Excel file and Google Sheet into the target tables in your cloud data warehouse.

At the bottom of the screen, in the **Task history** tab, you can see the progress of this pipeline and observe whether it runs successfully. You can click a task to see more details.

***

## Create a transformation pipeline

Now that we have our data, it's time to transform it. To do this, we need to create a transformation pipeline in our `SocialMediaData` folder, which we're going to call `JoinDataSets`.

1. Click **Files** to open the files panel.
2. Click the three dots **...** next to the `SocialMediaData` folder, then **Add**, then select **Transformation pipeline**.
3. In the **Pipeline name** field, name this pipeline `JoinDataSets`.
4. In the **Description** field, you can enter a description for the pipeline, or leave this field empty.

The new transformation pipeline will open on the canvas.

### Bring your data into the transformation pipeline using Table Input

Transformation pipelines don't begin with a **Start** component, like orchestration pipelines do. Instead, transformation pipelines need to start with a component that provides data for transformation.

We need to connect the data that we loaded in our `LoadSocialMediaData` orchestration pipeline to this new transformation pipeline. To do this, we'll use two [Table Input](/docs/components/table-input) components, one for each table containing our social media data.

1. In your transformation pipeline, click the **+** icon on the canvas to add the first component to this pipeline.
2. In the **Add component** dialog, search for and click the **Table Input** component.
3. In the component properties panel, adjust the following settings:
   * In **Database** and **Schema**, select **\[Environment default]**.
   * In **Target Table**, select the **posts** table.
   * In **Column Names**, click the **>>** icon to load all columns from this table, then click **Save**.
4. Click the **+** icon on the left of the canvas to add another component.
5. In the **Add component** dialog, search for the **Table Input** component, and drag a new **Table Input** component onto the canvas.
6. In the component properties panel, adjust the following settings:
   * In **Database** and **Schema**, select **\[Environment default]**.
   * In **Target Table**, select the **users** table.
   * In **Column Names**, click the **>>** icon to load all columns from this table, then click **Save**.
7. Validate the pipeline and sample your data. Make sure to sample the data for both **Table Input** components, to make sure both data sets have been brought into the transformation pipeline correctly.

***

## Transform your data

We are now going to use our `JoinDataSets` transformation pipeline to convert some data into numbers, join the two data sets, and write our transformed data to a new table in our data warehouse.

### Convert the 'Likes' and 'UserID' data to numbers using Convert Type

After sampling the data in our `posts` **Table Input** component, we've noticed that the data in our `Likes` and `User ID` columns appears as numbers with a decimal point. Let's convert the data in these columns to whole numbers. This is a more accurate way to represent this data—we can't have half a "like". To do this, we will use the [Convert Type](/docs/components/convert-type) component.

1. Click the **+** icon to the right of the `posts` **Table Input** component to add a connected component.
2. In the **Add component** dialog, search for and click the **Convert Type** component.
3. In the component properties panel, click the **Conversions** property, and configure two conversions:
   1. In the first row, in **Column** select **UserID**, and in **Type** select **NUMBER**. Leave the **Size** and **Precision** fields empty.
   2. Click the **+** icon in the bottom left to add another row.
   3. In the second row, in **Column** select **Likes**, and in **Type** select **NUMBER**. Leave the **Size** and **Precision** fields empty.
4. Click **Save**.
5. Validate the pipeline and sample your data.

### Join the data from Excel and Google Sheets using Join

We're now going to join our `posts` and `users` tables together using the [Join](/docs/components/join) component.

1. Click the **+** icon to the right of the **Convert Type** component to add a connected component.
2. In the **Add component** dialog, search for and click the **Join** component.
3. Click and drag the connection ring to the right of the `users` **Table Input** component to the **Join** component.
4. Click the **Join** component to open its component properties panel.
5. Click the **Main Table** property, and select **Convert Type**.
6. In the **Main Table Alias** property, enter `posts`.
7. Click the **Joins** property and configure how to join the tables:
   * In **Join Table**, select **users**.
   * In **Join Alias**, enter `users`.
   * In **Join Type**, select **Inner**.
   * Click **Save**.
8. Click the **Join Expressions** property, enter the following expression: `"posts"."UserID" = "users"."UserID"`, then click Save.
9. Click the **Column Mappings** property and configure the mappings:
   1. In the bottom left, click **Add all**.
   2. Select the checkboxes for the following columns:
      * **posts.PostID**
      * **posts.RowId**
      * **posts.UserID**
      * **users.id**
      * **users.UserID**
   3. Click the **-** icon to remove these columns from the selection, because we don't want them included in our data.
   4. Click **Save**.
10. Validate the pipeline and sample your data.

### Write the combined data to a new table using Rewrite Table

We're now going to create a new table containing our combined data. We'll use the [Rewrite Table](/docs/components/rewrite-table) component to create a new table, and name it `SocialMediaOutput`.

1. Click the **+** icon to the right of the **Join** component to add a connected component.
2. In the **Add component** dialog, search for and click the **Rewrite Table** component.
3. In the component properties panel, in **Target Table**, enter `SocialMediaOutput`.
4. Validate the pipeline.

<Note>
  Sampling is not available for the **Rewrite Table** component, because it uses the data in the component immediately before it.
</Note>

***

## Run the transformation pipeline

Click **Run** on the canvas.

This transformation pipeline will now run. It will convert our likes and user ID data to number format, and join our two data sets together. It will then create a table called `SocialMediaOutput` and write our data into this table in your data warehouse. You can see the progress of this task in the **Task history** tab at the bottom of the canvas.

***

## Perform more transformations

Now that we have written our joined data into our new `SocialMediaOutput` table, let's try aggregating and ranking our joined data using some more transformation pipeline components. We'll then create another new table containing this aggregated and ranked data.

### Find the total likes per country using Aggregate

Let's use the [Aggregate](/docs/components/aggregate) component to find the total number of likes for each country in our data set.

1. Click the **+** icon to the right of the **Join** component to add a connected component.
2. In the **Add component** dialog, search for and click the **Aggregate** component.
3. In the properties panel, click the **Groupings** property. In the dialog, select **Country** from the list on the left, and click the **→** icon to move it into the **Selected Groupings** list. Click **Save**.
4. Click the **Aggregations** property. In **Column Name**, select **Likes**. In **Aggregate Type**, select **Sum**. Click **Save**.
5. Validate the pipeline and sample your data.

You'll see that almost all our data has disappeared, leaving a data set containing the total number of likes for each country.

### Rank countries by total number of likes using Rank

Now let's sort our data. We'll use the [Rank](/docs/components/rank) component to sort our data by country, in descending order according to the total number of likes.

1. Click the **+** icon to the right of the **Aggregate** component to add a connected component.
2. In the **Add component** dialog, search for and click the **Rank** component.
3. In the component properties panel, click the **Ordering Within Partitions** property and configure the ordering settings:
   * In **Input Column**, select **sum\_Likes**.
   * In **Ordering**, select **Descending**.
   * Click **Save**.
4. Click the **Functions** property and configure a function:
   * In **Window Function**, select **Rank**.
   * In **Output Column**, enter `CountryRank`.
   * Click **Save**.
5. Validate the pipeline and sample your data.

You'll see that our data is now ranked in descending order by the number of total likes per country. Additionally, we have added a new column containing each country's rank.

### Write the transformed data to a new table using Rewrite Table

We're now going to create a new table called `RankedSocialMediaLikes` containing this aggregated, ranked data. This table will be separate from our `SocialMediaOutput` table, which contains all our `posts` and `users` data.

1. Click the **+** icon to the right of the **Rank** component to add a connected component.
2. In the **Add component** dialog, search for and click the **Rewrite Table** component.
3. In the component properties panel, in **Target Table**, enter `RankedSocialMediaLikes`.
4. Validate the pipeline.

Our transformation pipeline is now finished. It's a good idea to edit the names of your **Rewrite Table** components so that you know which tables they each write to. To rename a component, click the component, then click the edit icon next to the component name.

***

## Automate these pipelines using And and Run Transformation

We can now link our `LoadSocialMediaData` orchestration pipeline to our `JoinDataSets` transformation pipeline, so that when we run the orchestration pipeline, it automatically runs our transformation pipeline. We'll use the [And](/docs/components/and) and [Run Transformation](/docs/components/run-transformation) components to do this.

1. Open your `LoadSocialMediaData` orchestration pipeline.
2. Click the **+** icon to the right of the **Google Sheets** component to add a connected component.
3. In the **Add component** dialog, search for and click the **And** component.
4. Click and drag the connection ring to the right of the **Excel Query** component to the **And** component.
5. Click the **+** icon to the right of the **And** component to add a connected component.
6. In the **Add component** dialog, search for and click the **Run Transformation** component.
7. In the component properties panel, in **Transformation Pipeline**, select **SocialMediaData/JoinDataSets**.
8. Validate the pipeline.
9. Run the pipeline.

When you run this orchestration pipeline, it will now run both the `LoadSocialMediaData` and `JoinDataSets` pipelines. You will only see the orchestration pipeline in the **Task history** tab at the bottom of the canvas, but if you click this task, you will see that the transformation pipeline is run as the final step of this pipeline.
