> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Python Pushdown

export const designer = "Designer";

export const maia = "Maia";

export const ComponentMetadata = ({warehouses, unsupportedWarehouses = [], componentType, connectionInputs, connectionOutputs}) => {
  const allWarehouses = [...warehouses.map(w => ({
    name: w,
    supported: true
  })), ...unsupportedWarehouses.map(w => ({
    name: w,
    supported: false
  }))];
  return <div style={{
    background: 'var(--colors-background-light, #f9fafb)',
    border: '1px solid var(--colors-border-default, #e5e7eb)',
    borderRadius: '12px',
    padding: '20px 28px',
    marginBottom: '28px',
    boxShadow: '0 1px 4px rgba(0,0,0,0.10)'
  }}>
      <table style={{
    width: '100%',
    borderCollapse: 'collapse'
  }}>
        <tbody>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle',
    width: '180px'
  }}>Project Availability</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>
              <div style={{
    display: 'flex',
    flexWrap: 'wrap',
    gap: '8px'
  }}>
                {allWarehouses.map((w, i) => <span key={i} style={{
    background: w.supported ? '#dcfce7' : '#fee2e2',
    color: w.supported ? '#15803d' : '#b91c1c',
    border: `1px solid ${w.supported ? '#bbf7d0' : '#fca5a5'}`,
    borderRadius: '9999px',
    padding: '3px 12px',
    fontSize: '0.85rem',
    fontWeight: '500',
    whiteSpace: 'nowrap'
  }}>
                    {w.name} {w.supported ? '✅' : '❌'}
                  </span>)}
              </div>
            </td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Component Type</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{componentType}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Inputs</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{connectionInputs}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Outputs</td>
            <td style={{
    verticalAlign: 'middle'
  }}>{connectionOutputs}</td>
          </tr>
        </tbody>
      </table>
    </div>;
};

<ComponentMetadata warehouses={["Snowflake"]} unsupportedWarehouses={["Databricks", "Amazon Redshift", "BigQuery"]} componentType="Orchestration, Test" connectionInputs="One" connectionOutputs="Unlimited" />

Python Pushdown is an orchestration component that lets you execute a Python script using the [Snowpark](https://www.snowflake.com/en/data-cloud/snowpark/) service in your Snowflake account.

A Python Pushdown script is executed using the Snowpark service in your Snowflake account, using the Python version selected in the **Python Version** parameter. All standard libraries associated with that Python version will be available to use, as well as libraries provided by [Snowpark Anaconda](https://anaconda.org/anaconda/snowflake-snowpark-python), with the exception of those that require AWS or other credentials.

Validation of the Python syntax is handled by Snowpark at runtime. If validation fails, the script won't execute and the component will terminate with a failure condition.

Any output written via `print` statements will appear as the task completion message, and so output should be brief.

[Pipeline and project variables](/docs/guides/variables) can be used in the script. Note the following:

* You can set variables within the script using regular expressions, date and numeric calculations, and interpolated values.
* Date and time manipulation in variables (for example, `datetime`, `timedelta`, or similar) is supported, but note that `datetime` isn't supported as a data type in {maia} variables.
* To learn how to manipulate variables through a Python script, read [Using variables in scripts](/docs/guides/variables#using-variables-in-scripts).

Your Snowflake account will need to be configured to allow it to execute Python Pushdown scripts. Configuration requirements are described in [Snowflake configuration](#snowflake-configuration), below.

<Note>
  The component can't currently be used to access external databases, but can execute SQL on the Snowflake database and access Snowflake DataFrames. This restriction is expected to be removed in a future version of the component.
</Note>

***

## Properties

<ResponseField name="Name" type="string" required>
  A human-readable name for the component.
</ResponseField>

{/* <!-- param-start:[warehouse] | warehouses: [snowflake] --> */}

<ResponseField name="Warehouse" type="drop-down">
  The Snowflake warehouse used to execute your chosen Python Script. The special value `[Environment Default]` uses the warehouse defined in the environment. Read [Overview of Warehouses](https://docs.snowflake.com/en/user-guide/warehouses-overview) to learn more.
</ResponseField>

{/* <!-- param-start:[externalAccessIntegrations] | warehouses: [snowflake] --> */}

<ResponseField name="External Access Integrations" type="dual listbox">
  To access external websites, the script will use your Snowflake-configured External Access Integrations. Use the dialog to select one or more valid integrations from your Snowflake account if you require this functionality.
</ResponseField>

{/* <!-- param-start:[snowflakeSecrets] | warehouses: [snowflake] --> */}

<ResponseField name="Snowflake Secret" type="dual listbox">
  Select any secrets associated with your selected **External Access Integrations** that you want to make available for use in your script. This allows your script to access external data sources that require a secret. Read [Create and retrieve Snowflake secrets using Python Pushdown](#create-and-retrieve-snowflake-secrets-using-python-pushdown), below.

  In the **Snowflake secret** dialog, select a secret from the drop-down in the **Snowflake secret** column, and in the **Snowflake variable** column, enter the name of the variable that you want to assign the secret to. Repeat for as many secrets as you want to select, assigning each to a different variable.

  These variables exist in the Python script only, and only for the duration of the script execution. They aren't associated with any project or pipeline variables.

  If the component is configured to use a Snowflake secret and the secret is subsequently deleted from Snowflake, the component will fail validation until this property is corrected.

  This property is only available if you have selected one or more **External Access Integrations**. If there are no secrets associated with the selected integrations, this property will be displayed, but the drop-down in the **Snowflake secret** dialog will be empty.
</ResponseField>

{/* <!-- param-start:[pythonVersion] | warehouses: [snowflake] --> */}

<ResponseField name="Python Version" type="drop-down">
  Select the Python version you want to use for your script. The default setting is currently `3.10`.

  The Python version you select will determine which packages and their supported versions are available in the **Packages** parameter, and the options will automatically update when you switch the Python version.

  <Note>
    Available Python versions may change as Snowflake Snowpark adds or removes support for specific versions.
  </Note>
</ResponseField>

{/* <!-- param-start:[packagesGrid] | warehouses: [snowflake] --> */}

<ResponseField name="Packages" type="grid editor">
  Use this property to specify which libraries will be imported into the Snowpark Anaconda environment when the script executes.

  * **Package Name:** Select a package. The packages available depend on the selected Python version in the **Python Version** parameter.
  * **Package Version:** Choose the version of the package to import.

  The dialog lists every package that Anaconda supports, and selecting one is the equivalent of running `pip install <package>` in the Anaconda environment.

  You must select every library you have referenced with an `import` statement in your script. For example, if you include `import snowflake.connector` in the script, then select the `snowflake-connector-python` package.

  Selected packages are installed into the environment at script runtime. We strongly recommend that you only select packages that your script needs, as each additional package will increase the script execution time.
</ResponseField>

{/* <!-- param-start:[scriptLocation] | warehouses: [snowflake] --> */}

<ResponseField name="Script Location" type="drop-down" required>
  * **Component:** Use the **Python Script** property to add your Python script to this component. This is the default setting.
  * **File:** Use the **Select File** property to choose a `.py` file in your project. The drop-down will list all `.py` files from the repository that is connected to your project.
</ResponseField>

{/* <!-- param-start:[fileName] | warehouses: [snowflake] --> */}

<ResponseField name="Select File" type="drop-down" required>
  Choose a `.py` file in your project to run via this component. The drop-down will list all `.py` files from the repository that is connected to your project.

  Double-click a `.py` file in the **Files** panel to open that `.py` file in the code editor.
</ResponseField>

{/* <!-- param-start:[enableVariableResolution] | warehouses: [snowflake] --> */}

<ResponseField name="Allow Inline Variable Replacement" type="boolean">
  Available when **Script Location** is set to **Component**.

  Choose whether to enable variable resolution in the script. When set to **Yes**, {designer} variables can be referenced in the script. Default is **No**.
</ResponseField>

{/* <!-- param-start:[pythonScript] | warehouses: [snowflake] --> */}

<ResponseField name="Python Script" type="code editor" required>
  Add your Python script into the code editor. This script is stored in the component.
</ResponseField>

{/* <!-- param-start:[scriptTimeout] | warehouses: [snowflake] --> */}

<ResponseField name="Script Timeout" type="integer" required>
  The number of seconds to wait for script termination. After the set number of seconds has elapsed, the script is forcibly terminated. The default is 360 seconds (6 minutes).

  <Note>
    `Script Timeout` can't exceed the limit defined in the Snowflake internal query timeouts. For more information, read [Query Timeouts in Snowflake](https://select.dev/posts/snowflake-query-timeouts).
  </Note>
</ResponseField>

## Snowflake configuration

To execute Python Pushdown scripts, certain security privileges will be required within your Snowflake account. A Snowflake administrator will need to configure the following.

### Grant stored procedure privileges

The user that runs the Python Pushdown component will need Snowflake privileges that allow the creation of [stored procedures](https://docs.snowflake.com/en/developer-guide/stored-procedure/stored-procedures-overview). For more details, read [Understanding Caller's Rights and Owner's Rights Stored Procedures](https://docs.snowflake.com/en/developer-guide/stored-procedure/stored-procedures-rights).

### Grant external website access

To allow access to external websites, use `NETWORK RULE` commands within Snowflake. These commands have the following form:

```sql theme={null}
CREATE OR REPLACE NETWORK RULE gkc_w3schools_access_rule
    MODE = EGRESS
    TYPE = HOST_PORT
    VALUE_LIST = ('www.w3schools.com');

CREATE OR REPLACE NETWORK RULE gkc_bbc_access_rule
    MODE = EGRESS
    TYPE = HOST_PORT
    VALUE_LIST = ('www.bbc.co.uk');
```

Where `VALUE_LIST` is a website you need to grant access to.

These commands create network rules to specify which websites the users can access. By default there are no sites accessible, so one of these commands must be issued for each site. For more details read [Network rules](https://docs.snowflake.com/en/user-guide/network-rules).

Next, create an external access integration for the set of network rules. This can also include secrets which specify OAuth or basic authentication credentials if the sites require that. Use the following command:

```sql theme={null}
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION gkc_w3schools_access_integration
    ALLOWED_NETWORK_RULES = (gkc_w3schools_access_rule, gkc_bbc_access_rule)
    ENABLED = TRUE;
```

Where `ALLOWED_NETWORK_RULES` is a list containing the names of each `NETWORK RULE` you specified in the first step.

Integrations link network rules to users/roles. For more information, read:

* [CREATE EXTERNAL ACCESS INTEGRATION](https://docs.snowflake.com/en/sql-reference/sql/create-external-access-integration)
* [Creating and using an external access integration](https://docs.snowflake.com/en/developer-guide/external-network-access/creating-using-external-network-access)
* [External API authentication and secrets](https://docs.snowflake.com/en/user-guide/api-authentication)
* [External network access examples](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-examples)

Finally, grant usage on the integration to a role, and grant that role to a user. This requires the following commands:

```sql theme={null}
CREATE ROLE PUSHDOWN_POC_EXTERNAL_ACCESS;
GRANT USAGE ON INTEGRATION gkc_w3schools_access_integration TO ROLE PUSHDOWN_POC_EXTERNAL_ACCESS;
GRANT ROLE PUSHDOWN_POC_EXTERNAL_ACCESS TO USER saas_etl_agent_user;
```

Replace `gkc_w3schools_access_integration` in this example with the name of the integration you previously created.

The warehouse connection will need to specify this role when running the Python Pushdown component, so best practice would be to adopt Snowflake's recommendations on role hierarchy. Read [Role hierarchy and privilege inheritance](https://docs.snowflake.com/en/user-guide/security-access-control-overview#label-role-hierarchy-and-privilege-inheritance) for details.

***

## Create and retrieve Snowflake secrets using Python Pushdown

You can use the Python Pushdown component to access secrets you create in Snowflake. This lets you securely store API credentials as Snowflake `SECRET` objects, and use the Python Pushdown component to access these secrets programmatically inside your {maia} pipelines. This way, you can securely manage sensitive information, such as API keys or database credentials.

The process for this is:

1. [Set up Snowflake to hold secrets](#set-up-snowflake).
2. [Create secrets in Snowflake](#create-secrets-in-snowflake).
3. [Set up the Python Pushdown component](#set-up-the-python-pushdown-component).
4. [Create a Python script to use the secrets](#create-a-python-script-to-use-the-secrets).

These steps are described below. The first two steps will typically have to be performed by your Snowflake administrator.

### Set up Snowflake

Before creating secrets in Snowflake for use in {maia}, ensure the following Snowflake configuration is in place.

1. Create a [network rule](https://docs.snowflake.com/en/sql-reference/sql/create-network-rule) that allows outbound network traffic from Snowflake to the target API's hostname. The following SQL will create this rule:

   ```sql theme={null}
   CREATE OR REPLACE NETWORK RULE <my_api_network_rule>
   MODE = EGRESS
   TYPE = HOST_PORT
   VALUE_LIST = ('<api_hostname>');
   ```

   Where:

   * `<my_api_network_rule>` is the name of the network rule you want to create.
   * `<api_hostname>` is the hostname of the API you want to access. For example, to access `api.example.com`, use `VALUE_LIST = ('api.example.com')`.

   <Note>
     In this and other example commands shown, the symbols `< >` indicate a placeholder that you should replace with real values. You should *not* include the `< >` symbols in the commands.
   </Note>

2. Create an [external access integration](https://docs.snowflake.com/en/sql-reference/sql/create-external-access-integration) that allows Snowpark (and therefore the Python Pushdown component) to use the defined network rule. The following SQL will create this integration:

   ```sql theme={null}
   CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION <my_api_access_integration>
   ALLOWED_NETWORK_RULES = (<my_api_network_rule>)
   ENABLED = TRUE;
   ```

   Where:

   * `<my_api_access_integration>` is the name of the access integration you want to create.
   * `<my_api_network_rule>` is the name of the network rule you created previously.

   You will need to specify this access integration when you set up the Python Pushdown component in {maia}.

3. Grant privileges to the execution role.

   The Snowflake role that {maia} will use to execute the Python Pushdown script requires `USAGE` permission on the external access integration. The following SQL will grant this permission:

   ```sql theme={null}
   GRANT USAGE ON INTEGRATION <my_api_access_integration> TO ROLE <your_matillion_snowflake_role>;
   ```

   Where:

   * `<my_api_access_integration>` is the name of the access integration you created previously
   * `<your_matillion_snowflake_role>` is the Snowflake role {maia} uses.

Granting `USAGE` on secrets will be handled during secret creation.

### Create secrets in Snowflake

You can [create secrets](https://docs.snowflake.com/en/sql-reference/sql/create-secret) in Snowflake using the `CREATE SECRET` SQL command. The following example SQL shows how to create a secret and grant `OWNERSHIP` to {maia}:

```sql theme={null}
CREATE OR REPLACE SECRET <secret_name>
  TYPE = GENERIC_STRING
  SECRET_STRING = '<secret_value>';

GRANT OWNERSHIP ON SECRET <secret_name> TO ROLE <your_matillion_snowflake_role>;
```

Where:

* `<secret_name>` is the name of the secret you want to create.
* `<secret_value>` is the value of the secret, such as an API key or password.
* `<your_matillion_snowflake_role>` is the Snowflake role {maia} uses.

For more details about how to create and manage secrets, refer to Snowflake's documentation [Creating and managing secrets](https://docs.snowflake.com/en/sql-reference/sql/create-secret) and [API reference for access to secrets](https://docs.snowflake.com/en/developer-guide/external-network-access/secret-api-reference#python-api-for-secret-access).

### Set up the Python Pushdown component

Add a Python Pushdown component to your {maia} pipeline in {designer}, and configure its properties as follows:

* **External Access Integrations:** Add the name of the external access integration previously created in Snowflake. For example, `<my_api_access_integration>`. This enables network calls from the Python script.
* **Snowflake Secret:** Select the Snowflake secrets that you [previously created in Snowflake](#create-secrets-in-snowflake), and assign them to Python variables. For example, you might map the secret `aws_access_key` to a variable `snow_aws_access_key`. Once you've mapped the Snowflake secrets to variables, you can access them in your Python script.
* **Packages:** You need the `boto3` package for the specific example below. Your script may address a different integration, requiring different Python package dependencies.

### Create a Python script to use the secrets

With the Snowflake secrets mapped to variables, you can access them in your Python Pushdown script. Here's an example of how to reference Snowflake secrets inside a Python script. This script is intended as an example only, and you should adapt it to your specific use case.

```python theme={null}
import boto3
import _snowflake

# Referencing Snowflake secrets stored as variables
aws_id = _snowflake.get_generic_secret_string('snow_aws_access_key')
aws_secret = _snowflake.get_generic_secret_string('snow_aws_access_secret')
secret_name = pipe_secret_name
secret_region = pipe_secret_region

# Creating an AWS session using the referenced secrets
session = boto3.Session(
    aws_access_key_id=aws_id,
    aws_secret_access_key=aws_secret
)

# Fetching secret value from Secrets Manager
client = session.client(service_name='secretsmanager', region_name=secret_region)
secret_response = client.get_secret_value(SecretId=secret_name)

# Storing the secret in a Snowflake variable
context.updateVariable('pipe_svc_acct', secret_response["SecretString"])
print(pipe_svc_acct)
```

In this example:

* The secrets are securely retrieved using the `get_generic_secret_string()` method from the `_snowflake` module, allowing for secure access to external systems like AWS.
* `aws_id` and `aws_secret` are populated by referencing the Snowflake secrets `snow_aws_access_key` and `snow_aws_access_secret`, and are variables that are private to this script, so their populated values are never visible outside the script.

<Note>
  If you encounter errors such as `NameError: name 'snow_aws_access_key' is not defined`, double-check that the variable names mapped in the **Snowflake secret** property match exactly with the ones used in the Python script.
</Note>

***

## Accessing the Snowflake session

You can use a Python script to access the Snowflake `session` object that allows you to read from and write to the Snowflake database. Read [the Snowflake documentation](https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/api/snowflake.snowpark.Session) for details.

By default, the script will access the default schema, warehouse, and database for your [environment](/docs/guides/environments#select-data-warehouse-defaults), but you can override those defaults by including override commands in the script, for example:

```
session.use_warehouse("NEW_WAREHOUSE")
session.use_database("NEW_DATABASE")
session.use_schema("NEW_SCHEMA")
```

These overrides will apply within the current script execution, and will not reset the properties for any other component in the pipeline or persist after the script completes.

The following example script shows a database query that uses the `session` object to retrieve a single (aggregate) row of data, and store the result into a variable for use elsewhere in the pipeline:

```
queryresult = session.sql('select "Texter" from "19820_all_data_types" where "Integer"=123456').collect()
print("queryresult[0][0] is: ",queryresult[0][0])
context.updateVariable('scalarVar',queryresult[0][0])
```

***

## The main() function

Scripts in Python Pushdown don't make use of a default handler to call the `main()` function, as is the case in [Snowflake Python Worksheets](https://docs.snowflake.com/en/developer-guide/snowpark/python/python-worksheets). This is a key difference between running scripts in Python Pushdown and in Snowflake Python Worksheets. In Python Pushdown, if you wish to use a `main()` function you must explicitly call it in your script to execute it:

```python theme={null}
main()
```
