PDF
PDF OCR Reader
This document explains the PDF OCR Reader node, which extracts text from scanned PDFs and image-based documents.
Node Inputs
Required Fields
- File Name: Upload PDF or select existing file from storage
Optional Fields
- Use Link: Enable to read from URL
- Specify Pages: Select specific pages to read
- Split PDF Content by Page: Get page-by-page output in a list format. Each item in the list is a single page from the PDF.
- Image Model: Choose AI model for OCR
- Temperature: Controls OCR accuracy (0-1)
- Cache Response: Save results for reuse
Show As Input Options
You can expose these fields as inputs:
- Temperature
Node Output
- PDF Contents: Extracted text (single or per page)
Node Functionality
The PDF OCR Reader can:
- Reads image based PDF documents
- Extracts text from images
- Processes handwriting
- Handles multiple pages
- Supports various languages
Available AI Models
- GPT-4o Vision
- GPT-4o Mini Vision
- Claude 3.5 Sonnet
- Claude 3 Haiku
- Gemini 1.5 Pro
- Gemini 1.5 Flash
Common Use Cases
- Scanned Documents:
- Image-Based PDFs:
- Mixed Content:
Important Considerations
- Advanced models (GPT-4o & Claude 3.5) cost 20 credits, and standard models cost 2 credits per run
In summary, the PDF OCR Reader node helps convert image-based PDFs into searchable text using advanced AI vision models.