This document explains the PDF OCR Reader node, which extracts text from scanned PDFs and image-based documents.

Node Inputs

Required Fields

  • File Name: Upload PDF or select existing file from storage

Optional Fields

  • Use Link: Enable to read from URL
  • Specify Pages: Select specific pages to read
  • Split PDF Content by Page: Get page-by-page output in a list format. Each item in the list is a single page from the PDF.
  • Image Model: Choose AI model for OCR
  • Temperature: Controls OCR accuracy (0-1)
  • Cache Response: Save results for reuse

Show As Input Options

You can expose these fields as inputs:

  • Temperature

Node Output

  • PDF Contents: Extracted text (single or per page)

Node Functionality

The PDF OCR Reader can:

  • Reads image based PDF documents
  • Extracts text from images
  • Processes handwriting
  • Handles multiple pages
  • Supports various languages

Available AI Models

  • GPT-4o Vision
  • GPT-4o Mini Vision
  • Claude 3.5 Sonnet
  • Claude 3 Haiku
  • Gemini 1.5 Pro
  • Gemini 1.5 Flash

Common Use Cases

  1. Scanned Documents:
Input: Scanned contracts.pdf
Output: Searchable text
Use: Document digitization
  1. Image-Based PDFs:
Input: Photographed receipts.pdf
Output: Extracted line items
Use: Expense processing
  1. Mixed Content:
Input: Handwritten notes.pdf
Output: Digital text
Use: Note digitization

Important Considerations

  • Advanced models (GPT-4o & Claude 3.5) cost 20 credits, and standard models cost 2 credits per run

In summary, the PDF OCR Reader node helps convert image-based PDFs into searchable text using advanced AI vision models.