> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maia.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# File Iterator

export const m_runner_0 = undefined

export const m_runner = "Maia runner";

export const maia = "Maia";

export const ComponentMetadata = ({warehouses, unsupportedWarehouses = [], componentType, connectionInputs, connectionOutputs}) => {
  const allWarehouses = [...warehouses.map(w => ({
    name: w,
    supported: true
  })), ...unsupportedWarehouses.map(w => ({
    name: w,
    supported: false
  }))];
  return <div style={{
    background: 'var(--colors-background-light, #f9fafb)',
    border: '1px solid var(--colors-border-default, #e5e7eb)',
    borderRadius: '12px',
    padding: '20px 28px',
    marginBottom: '28px',
    boxShadow: '0 1px 4px rgba(0,0,0,0.10)'
  }}>
      <table style={{
    width: '100%',
    borderCollapse: 'collapse'
  }}>
        <tbody>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle',
    width: '180px'
  }}>Project Availability</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>
              <div style={{
    display: 'flex',
    flexWrap: 'wrap',
    gap: '8px'
  }}>
                {allWarehouses.map((w, i) => <span key={i} style={{
    background: w.supported ? '#dcfce7' : '#fee2e2',
    color: w.supported ? '#15803d' : '#b91c1c',
    border: `1px solid ${w.supported ? '#bbf7d0' : '#fca5a5'}`,
    borderRadius: '9999px',
    padding: '3px 12px',
    fontSize: '0.85rem',
    fontWeight: '500',
    whiteSpace: 'nowrap'
  }}>
                    {w.name} {w.supported ? '✅' : '❌'}
                  </span>)}
              </div>
            </td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Component Type</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{componentType}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    paddingBottom: '14px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Inputs</td>
            <td style={{
    paddingBottom: '14px',
    verticalAlign: 'middle'
  }}>{connectionInputs}</td>
          </tr>
          <tr>
            <td style={{
    fontWeight: '600',
    paddingRight: '32px',
    whiteSpace: 'nowrap',
    verticalAlign: 'middle'
  }}>Connection Outputs</td>
            <td style={{
    verticalAlign: 'middle'
  }}>{connectionOutputs}</td>
          </tr>
        </tbody>
      </table>
    </div>;
};

<ComponentMetadata warehouses={["Snowflake", "Databricks", "Amazon Redshift", "Google BigQuery"]} componentType="Orchestration & Test" connectionInputs="One" connectionOutputs="Unlimited" />

The File Iterator orchestration component lets you loop over matching files in a remote file system.

The component searches for files in a number of remote file systems, running its attached component once for each file found. Filenames and path names are mapped into project or pipeline variables, which can then be referenced from the attached component.

To attach the iterator to another component, use the connection ring *beneath* the iterator to connect to the input of the other component. The two components will automatically "snap" together, with the iterator component sitting on top of the other component, and can be dragged around the canvas as a single component. For more information about attaching, stacking, and detaching iterators, read [Attaching, stacking, and detaching iterators](/docs/guides/iterator-components#attaching-stacking-and-detaching-iterators).

If you need to iterate more than one component, put them into a separate orchestration pipeline or transformation pipeline and use a [Run Transformation](/docs/components/run-transformation) or [Run Orchestration](/docs/components/run-orchestration) component attached to the iterator. In this way, you can run an entire pipeline flow multiple times, once for each row of variable values.

If the component requires access to a cloud provider (AWS, Azure, or Google Cloud), it will use credentials as follows:

* If using [Matillion Full SaaS](/docs/guides/runner-overview#matillion-full-saas): The component will use the [cloud credentials](/docs/guides/cloud-credentials) associated with your environment to access resources.
* If using [Hybrid SaaS](/docs/guides/runner-overview#hybrid-saas): By default the component will inherit the agent's execution role (service account role). However, if there are [cloud credentials](/docs/guides/cloud-credentials) associated to your environment, these will overwrite the role.

***

## Properties

<ResponseField name="Name" type="string" required>
  A human-readable name for the component.
</ResponseField>

{/* <!-- param-start:[inputDataType] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Input data type" type="drop-down" required>
  Select the remote file system to search. Available data types are:

  * Azure Blob Storage
  * Google Cloud Storage
  * FTP
  * SFTP
  * Amazon S3
  * Windows Fileshare.
</ResponseField>

{/* <!-- param-start:[inputDataUrl, inputDataUrl1, inputDataUrl2, inputDataUrl3] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Input data URL" type="string" required>
  The URL of the source files, including the full path to the folder you wish to iterate over. You can further refine the filenames to be iterated over using the **Filter regex** property.

  Clicking this property will open the **Input data URL** dialog. This displays a list of all existing storage accounts. Select a storage account, then a container, and then a subfolder if required. This constructs a URL with the following format:

  ```
  DATATYPE://<account>/<container>/<path>
  ```

  You can also type the URL directly into the **Storage Accounts path** field, instead of selecting listed elements. This is particularly useful when using [project and pipeline variables](/docs/guides/variables) in the URL, for example:

  ```
  AZURE://${jv_blobStorageAccount}/${jv_containerName}
  ```

  Special characters used in this field *must* be URL-safe.
</ResponseField>

{/* <!-- param-start:[domain] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Domain" type="string" required>
  Enter your connection domain.
</ResponseField>

{/* <!-- param-start:[username] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Username" type="string" required>
  Provide a valid username for the connection.
</ResponseField>

{/* <!-- param-start:[password] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Password" type="drop-down" required>
  The [secret definition](/docs/guides/secrets-and-secret-definitions) denoting the password for the connection. Your password should be saved as a secret definition before using this component.
</ResponseField>

{/* <!-- param-start:[key] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Key" type="drop-down">
  The [secret definition](/docs/guides/secrets-and-secret-definitions) denoting your SFTP key for the connection. Your SFTP key should be saved as a secret definition before using this component.

  This parameter is optional and will only be used if the data source requests it.

  This must be the complete private key, beginning with "-----BEGIN RSA PRIVATE KEY-----" and conforming to the same structure as an RSA private key.

  The following private key formats are currently supported:

  * DSA
  * RSA
  * ECDSA
  * Ed25519

  In a [Hybrid SaaS](/docs/guides/runner-overview#matillion-fully-managed-vs-hybrid-cloud) configuration, you need to manually convert the private key into a format that allows it to be stored in your AWS Secrets Manager or Azure Key Vault. You can do this with the following command:

  ```bash theme={null}
  ssh-keygen -p -f YOUR_PRIVATE_KEY -m pem
  ```

  The flags used in this command are:

  | Flag | Description                                                                                                                                    |
  | ---- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
  | `-p` | Changes the passphrase of a private key file. Use this to trigger a format conversion without setting a new passphrase.                        |
  | `-f` | Specifies the filename of the private key file to convert. Replace `YOUR_PRIVATE_KEY` with the path to your key file.                          |
  | `-m` | Specifies the output format. The `pem` value converts the key to PEM format, which is compatible with AWS Secrets Manager and Azure Key Vault. |

  [Read this page to know more about supported flags.](https://www.man7.org/linux/man-pages/man1/ssh-keygen.1.html)
</ResponseField>

{/* <!-- param-start:[setHomeDirectoryAsRoot] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Set home directory as root" type="boolean" required>
  * **No:** The URL path is from the server root.
  * **Yes:** The URL path is relative to the user's home directory. Default setting is Yes.

  This property is only available when the **Input data type** is set to FTP or SFTP.
</ResponseField>

{/* <!-- param-start:[recursive] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Recursive" type="boolean" required>
  * **No:** Only search for files within the folder identified by the Input data URL.
  * **Yes:** Consider files in subdirectories when searching for files.

  This property is only available when the **Input data type** is set to FTP, SFTP, or Windows Fileshare.
</ResponseField>

{/* <!-- param-start:[maxRecursiveDepth] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Max recursive depth" type="integer" required>
  Set the maximum recursion depth into subdirectories. This property is only available when **Recursive** is set to Yes.
</ResponseField>

{/* <!-- param-start:[ignoreHidden] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Ignore hidden" type="boolean" required>
  * **No:** Include hidden files.
  * **Yes:** Ignore hidden files, even if they otherwise match the filter. This is the default setting.

  This property is only available when the **Input data type** is set to FTP, SFTP, or Windows Fileshare.
</ResponseField>

{/* <!-- param-start:[maxIterations] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Max iterations" type="integer" required>
  The maximum number of times the attached component runs. The maximum value cannot exceed 5000.

  The value you set interacts with the number of matching files as follows:

  * If **Max iterations** is lower than the number of matching files, the component iterates over only the first N files, where N is the value you set. For example, setting this to 25 means only the first 25 matching files are iterated over, even if more files exist in the remote file system.
  * If **Max iterations** is higher than the number of matching files, the component cycles through the available files repeatedly until it reaches the set number of iterations. For example, setting this to 1000 when only 10 files match means each file is iterated 100 times.
</ResponseField>

{/* <!-- param-start:[filterRegex] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Filter regex" type="string" required>
  The [java-standard regular expression](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) used to test against each candidate file's full path. If you want ALL files, specify `.*`

  Filter regex starts with a variable that represents the folder name with `/.*` as the suffix. The forward slash defines to look within the folder. The `.*` is the wildcard to return all files in that folder.

  Example: `${jv_folder}/.*`

  If Filter regex has a folder structure `${jv_folder}/.*`, you do need to have a **Recursive** value as **YES** to find the folder beyond Input data URL path `DataType://${jv_blobStorageAccount}/${jv_containerName}/`.
</ResponseField>

{/* <!-- param-start:[concurrency] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Concurrency" type="drop-down" required>
  * **Concurrent:** Iterations are run concurrently.
  * **Sequential:** Iterations are done in sequence, waiting for each to complete before starting the next. This is the default setting.

  [Full SaaS](/docs/guides/runner-overview#matillion-full-saas) deployments are limited to 20 concurrent tasks, with additional tasks being queued. [Hybrid SaaS](/docs/guides/runner-overview#hybrid-saas) deployments have 20 concurrent tasks per {m_runner_0} instance, with a maximum of 100 instances if configured accordingly.
</ResponseField>

{/* <!-- param-start:[variables] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Variables" type="column editor" required>
  Select project variables that will hold the values of file attributes. This will allow you to use the matching file's metadata (such as its filename) in the component attached to the File Iterator. The project variables must have been defined before using them in this component. Read [Project and pipeline variables](/docs/guides/variables) for more information.

  Use **+** to add a variable, and specify the following:

  * **Variable:** Select an existing project variable to hold a given file attribute.
  * **File attribute:** For each matched file, the project variable will be populated with the attribute selected here. The attributes which can be used are:
    * Base Folder.
    * Subfolder. Useful when recursing.
    * Filename.
    * Last modified. A date formatted as ISO8601, with a UTC indicator. For example: `2021-01-04T10:45:15.123Z`.

  Users may experience a lag in how their data warehousing platform updates the last modified date, for example between when {maia} interacts with the file versus the actual last modified date. This behavior is a limitation of the platform and is subject to that platform's metadata.
</ResponseField>

{/* <!-- param-start:[breakOnFailure] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Break on failure" type="boolean" required>
  If a failure occurs during any iteration, the failure link is followed. This parameter controls whether it's followed immediately or after all iterations have been attempted.

  * **No:** Attempt to run the attached component for each iteration, regardless of success or failure. This is the default setting.
  * **Yes:** If the attached component doesn't run successfully, fail immediately.

  This property is only available when **Concurrency** is set to Sequential. When set to Concurrent, all iterations will be attempted.
</ResponseField>

{/* <!-- param-start:[stopOnCondition] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Stop on condition" type="drop-down" required>
  Select **Yes** to stop the iteration based on a condition specified in the **Condition** property. The default setting is **No**.

  For this property to be available, set **Concurrency** to **Sequential**.
</ResponseField>

{/* <!-- param-start:[stopOnConditionMode] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Stop on condition mode" type="drop-down">
  Select the method for creating the stop condition.

  * **Simple:** A no-code condition editor opens, where you specify an **Input Variable**, **Qualifier**, **Comparator**, and **Value**. This is the default setting.
  * **Advanced:** A code editor opens, where you write the condition manually using SQL.

  This property is only available when **Stop on condition** is set to **Yes**.
</ResponseField>

{/* <!-- param-start:[condition] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Condition (simple mode)" type="columns editor">
  Click the gear icon to open the **Condition** dialog. Use **+** and **-** to add or remove conditions. Each condition has the following columns:

  **Input variable:** An input variable to form a condition around.

  **Qualifier:** Select whether the condition should be applied (**Is**, the default) or reversed (**Not**). Selecting **Not** reverses the comparator, so **Equal to** becomes "not equal to", **Less than** becomes "greater than or equal to", and so on.

  **Comparator:** Select from:

  * **Less than:** Value of the input variable must be less than the specified value.
  * **Less than or equal to:** Value of the input variable must be less than or equal to the specified value.
  * **Equal to:** Value of the input variable must be equal to the specified value.
  * **Greater than or equal to:** Value of the input variable must be greater than or equal to the specified value.
  * **Greater than:** Value of the input variable must be greater than the specified value.
  * **Blank:** Checks whether the input variable is empty.

  **Value:** The value to compare against.

  Toggle **Text mode** to write the condition manually as a JavaScript expression instead.

  Toggle **Use Grid Variable** to use a grid variable to define the condition.

  This property is only available when **Stop on condition mode** is set to **Simple**.
</ResponseField>

{/* <!-- param-start:[advancedCondition] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Condition (advanced mode)" type="code editor">
  Enter the condition manually in the code editor using SQL.

  This property is only available when **Stop on condition mode** is set to **Advanced**.
</ResponseField>

{/* <!-- param-start:[combineConditions] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Combine conditions" type="drop-down">
  When multiple conditions are present, they can be separated by **And** or **Or**.

  * **And:** All the conditions must be true.
  * **Or:** Any of the conditions must be true.

  This property is only available when **Stop on condition** is set to **Yes** and **Stop on condition mode** is set to **Simple**.
</ResponseField>

{/* <!-- param-start:[maximumConcurrentIterations] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Maximum concurrent iterations" type="integer" optional>
  Use this to set a maximum limit on the number of concurrent iterations that will be attempted. This is important to ensure that the workload is orchestrated to accommodate any source and target constraints.

  If you leave this property blank, no upper limit will be placed on the number of concurrent tasks that can be attempted.

  If you [stack](/docs/guides/iterator-components#stacking-iterators) iterators, concurrency limits will multiply exponentially. For example, if you stack an iterator with Maximum concurrent iterations set to 10 on top of another iterator with Maximum concurrent iterations set to 10, the component could attempt 100 concurrent iterations.

  This property is only available when **Concurrency** is set to **Concurrent**.
</ResponseField>

{/* <!-- param-start:[recordValuesInTaskHistory] | warehouses: [snowflake, databricks, redshift, bigquery] --> */}

<ResponseField name="Record values in task history" type="boolean" optional>
  If this is set to **Yes**, the [Task history](/docs/guides/designer-ui-basics#task-history) tab and [Observability dashboard](/docs/guides/pipeline-run-history) will show each iterator variable as a name/value pair in the component result message for each iteration. The default is **Yes**.
</ResponseField>

{/* <!-- param-start:[recordValuesInTaskHistory] | warehouses: [snowflake, databricks, redshift] --> */}

<ResponseField name="Record values in task history" type="boolean" optional>
  If this is set to **Yes**, the [Task history](/docs/guides/designer-ui-basics#task-history) tab and [Observability dashboard](/docs/guides/pipeline-run-history) will show each iterator variable as a name/value pair in the component result message for each iteration. The default is **Yes**.
</ResponseField>

## Counting the number of iterations

The File Iterator determines the number of files to iterate at runtime, based on the matching files found. You can count the number of iterations using [System variables](/docs/guides/iterator-components#system-variables).

***

## Checking whether a file exists

{maia} does not provide a dedicated method, component, or function specifically for checking whether a file exists in a remote location.

However, you can use the File Iterator component as a practical alternative. If the File Iterator matches the file, it runs at least one iteration; if no files match, it runs zero iterations.

A common pattern is to initialize a Text project variable (for example, `file_found`) to `false` before the File Iterator, then set it to `true` inside the iterator. If the variable remains `false` after the iterator completes, no matching file was found.

This approach works for any remote file system supported by the File Iterator, including Azure Blob Storage, Google Cloud Storage, FTP, SFTP, Amazon S3, and Windows Fileshare.
