Production use of this feature is available for specific editions only. Contact our sales team for more information.
PARSE_DOCUMENT supports processing of documents stored in an internal Snowflake stage, or an external stage. In creating your stage, Server Side Encryption is required. Otherwise, PARSE_DOCUMENT will return an error that the provided file isn’t in the expected format or is client-side encrypted.
Use case
You can use this component to obtain a range of information from PDFs and image files. For example, use it to:- Automatically extract customer details from completed PDF forms.
- Extract details from scanned receipts when processing expenses.
Video example
Properties
A human-readable name for the component.
The Snowflake database. The special value
[Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.The Snowflake schema. The special value
[Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.The internal stage (Snowflake managed) or external stage (such as AWS S3, Azure Blob Storage, or Google Cloud Storage) where the file to extract content from is stored.
Use a file pattern to specify and then match files based on their names or extensions. For example,
file1.pdf would return any file with that name and extension.Regular Expressions are supported. For example, .*\.pdf would match all .PDF files in a stage.- OCR: This mode is optimized for text extraction from documents. This mode is recommended when extracting content from documents that don’t have a strong semantic structure. This is the default setting.
- Layout: This mode is optimized for text and layout extraction, including elements such as tables. According to the Snowflake documentation, when using this mode, the data is returned as Markdown, and can capture the layout and structure of content elements better than OCR.
The name of the new column that is output when the component is executed.

