Skip to main content
The PDF Reader node extracts text content from PDF files with flexible reading modes to handle everything from simple text-based documents to complex scanned files.
PDF Reader

Overview

Standard Reading

Extract text directly from PDFs at no additional cost

Advanced Reading

AI-powered structured extraction optimized for LLM processing

OCR Mode

Read scanned documents and image-based PDFs with AI vision

Reading Modes

Choose the right reading mode based on your PDF type and use case:
  • Standard
  • Advanced
  • OCR
Best for: Text-based PDFs with selectable text
  • Uses direct text extraction
  • Fastest processing speed
  • Cost: 0 additional credits
  • Limitations: Cannot read scanned images or handwritten content

Configuration

Required Inputs

PDF File Name
file
required
The PDF file to extract text from. This is a file picker that allows you to:
  • Upload a new file directly
  • Select an existing file from storage
  • Dynamically pass in a file from other nodes (like Google Drive)
Only shown when “Use Link?” is disabled. Accepts .pdf files only.

Dynamic File Input

To pass PDF files dynamically from other nodes (such as files retrieved from Google Drive):
1

Enable Dynamic Input

Hover over the PDF Reader node and click “Configure inputs”
2

Activate PDF File Name

In the configuration panel, enable “PDF File Name” as a dynamic input
Configure PDF File Name as dynamic input
3

Connect File Source

Connect the file output from another node (like Google Drive File Reader) to the PDF File Name input
Connect file from Google Drive

Optional Settings

Enable to read PDF from a URL instead of an uploaded file.When enabled, you’ll provide a File URL instead of uploading.
File URL
string
Direct link to a publicly accessible PDF file.Example: https://example.com/document.pdf
URL must be publicly accessible without authentication.
Reading Mode
enum
default:"Standard"
Choose how the PDF should be processed:
  • Standard: Direct text extraction (0 credits)
  • Advanced: AI-powered structured reading (+5 credits)
  • OCR: Optical character recognition (cost varies by model)
Specify Pages?
boolean
default:"false"
Enable to read only specific pages instead of the entire document.
Page Numbers
string
Comma-separated page numbers and ranges.Format examples:
  • 1-5 (reads pages 1 through 5)
  • 1, 3, 5 (reads pages 1, 3, and 5)
  • 1-5, 8, 11-13 (reads pages 1-5, 8, and 11-13)
Page numbers are 1-indexed (first page is page 1).
Split PDF Content by Page
boolean
default:"false"
Controls how extracted text is returned:
  • Enabled: Returns a list where each item is one page
  • Disabled: Returns all content as a single combined text string
Enable this when you need to process pages individually in Loop Mode.
Is Protected by Password?
boolean
default:"false"
Enable if your PDF requires a password to open.
Password
string
The password needed to decrypt and read the PDF file.Works with both Standard and Advanced reading modes.

Output

PDF Contents
string | string[]
The extracted text content from the PDF.Output type depends on configuration:
  • If “Split PDF Content by Page” is enabled: Returns string[] (list of pages)
  • If “Split PDF Content by Page” is disabled: Returns string (combined text)
Each page’s content is preserved in order. When combined, pages are separated by newline characters.

Common Use Cases

1

Simple Document Extraction

Extract all text from a standard PDF document at no additional cost.Configuration:
  • Reading Mode: Standard
  • Split PDF Content by Page: Disabled
Result: Complete document text as a single string
2

LLM-Optimized Processing

Process complex PDFs with tables and formatting for AI analysis.Configuration:
  • Reading Mode: Advanced (+5 credits)
  • Connect output to Ask AI or Extract Data nodes
Result: Structured content optimized for AI processing
3

Scanned Document Digitization

Convert scanned PDFs or image-based documents to text.Configuration:
  • Reading Mode: OCR
  • Choose appropriate AI model (Mini models for cost savings)
Result: Extracted text from non-selectable content
4

Page-Specific Processing

Extract and analyze specific pages from large documents.Configuration:
  • Specify Pages: Enabled
  • Page Numbers: “1-3, 10”
  • Split PDF Content by Page: Enabled
Result: List containing only selected pages

Credit Costs

Reading ModeAdditional CostBest For
Standard0 creditsText-based PDFs with selectable text
Advanced+5 creditsComplex documents for AI processing
OCR2-20 creditsScanned documents, depends on AI model
Cost optimization tips:
  • Use Standard mode whenever possible to save credits
  • Choose Mini models (GPT-4.1 Mini, Claude 3.5 Haiku) for OCR when quality permits
  • Test with single documents before batch processing
  • Use page selection to process only needed sections

Troubleshooting

Problem: PDF Reader returns blank text or missing contentSolutions:
  • Check if PDF contains selectable text (try highlighting text in a PDF viewer)
  • For scanned documents, switch to OCR mode
  • For image-based PDFs, use OCR mode instead of Standard
  • Verify the PDF isn’t corrupted by opening it in another application
Problem: Error message when trying to read a password-protected PDFSolutions:
  • Enable “Is Protected by Password?” option
  • Enter the correct password in the Password field
  • Verify password works by testing in a PDF viewer first
  • Some PDFs have restrictions on copying/extraction - OCR mode may help
Problem: Cannot read PDF from provided URLSolutions:
  • Ensure URL points directly to a PDF file (ends in .pdf)
  • Verify URL is publicly accessible (no login required)
  • Check URL doesn’t expire or require authentication
  • Try downloading the PDF manually to test URL validity
Problem: PDF processing exceeds timeout limitsSolutions:
  • Use page selection to process only needed pages
  • Split large documents into smaller files
  • Consider using Standard mode instead of Advanced for faster processing
  • For very large documents, process in batches using Loop Mode

Batch Processing

The PDF Reader node supports Loop Mode for processing multiple PDFs in a single workflow.
When using Loop Mode:
  • Each PDF in the input list is processed independently
  • Consider credit costs when processing large batches
  • Use “Split PDF Content by Page” to handle per-page analysis across multiple documents
Example batch workflow:
  1. Provide a list of PDF files as input
  2. Enable Loop Mode on PDF Reader
  3. Each PDF is read and processed sequentially
  4. Combine node aggregates all results
I