This document explains the Analyze Image node, which uses AI vision to extract information and insights from images.

Node Inputs

Required Fields

  • Image File: Upload image or PDF (JPG, PNG, GIF, WEBP or PDF)
  • Prompt: Question or instruction for analysis. Be detailed here for accurate output

Optional Fields

  • Use Link: Enable to use direct image URLs
    • Only supports publicly accessible media links (e.g., https://example.com/image.jpg)
    • Does not support Google Drive, Dropbox, or other file-sharing links
    • URL must point directly to the image file
  • Temperature: Controls analysis creativity (0-1)
    • 0: More focused, consistent
    • 1: More creative, varied
  • Cache Response: Save responses for reuse

Show As Input

The node allows you to configure certain parameters as dynamic inputs. You can enable these in the “Configure Inputs” section:

  • Use Link: Boolean

    • true/false to use image URL instead of file upload
    • When enabled, allows input of publicly accessible image URLs
    • Remember: Only direct media links are supported
  • Prompt: String

    • The specific question or instruction for analyzing the image
    • Example: “Describe the main objects in this image”
  • image_model_preference: String

    • Name of the AI model to use for image analysis
    • Accepted values: “GPT-4o Vision”, “Claude 3 Haiku”, etc.
  • Cache Response: Boolean

    • true/false to enable/disable response caching
    • Helps reduce API calls for identical inputs
  • Temperature: Number

    • Value between 0 and 1
    • Controls analysis consistency and creativity

When enabled as inputs, these parameters can be dynamically set by previous nodes in your workflow. If not enabled, the values set in the node configuration will be used.

Node Output

  • Analysis: AI’s detailed response about the image

Node Functionality

The Analyze Image can:

  • Processes images with AI vision
  • Extracts text from images
  • Generates descriptions
  • Answers queries about content
  • Identifies objects and scenes
  • Can read image-based PDFs

Available AI Models

  • OpenAI o1
  • GPT-4o vision
  • GPT-4o mini vision
  • Claude 3.7 Sonnet
  • Claude 3.7 Sonnet Thinking
  • Grok 2 Vision
  • Claude 3 Haiku
  • Gemini 2.0 Flash

AI Model Selection Guide

When choosing an AI model for your task, consider these key factors:

Model TypeIdeal Use CasesConsiderations
Standard ModelsGeneral content creation, basic Q&A, simple analysisLower cost, faster response time, good for most everyday tasks
Advanced ModelsComplex analysis, nuanced content, specialized knowledge domainsBetter quality but higher cost, good balance of performance and efficiency
Expert & Thinking-Enabled ModelsComplex reasoning, step-by-step problem-solving, coding, detailed analysis, math problems, technical contentHighest quality but most expensive, best for complex and long-form tasks, longer response time

Additional selection factors:

  • Task complexity and required accuracy
  • Response time requirements
  • Cost considerations
  • Consistency needs across runs
  • Specialized knowledge requirements

For more detailed information on AI models with advanced reasoning capabilities, you can refer to:

Common Use Cases

  1. Text Extraction:
Prompt: "Extract all text visible in this image"
Use: Scanning documents, reading signs
  1. Visual Description:
Prompt: "Describe this image in detail"
Use: Accessibility, content cataloging
  1. Object Detection:
Prompt: "List all objects in this image"
Use: Inventory, scene analysis

Important Considerations

  1. Expert AI models (eg. OpenAI o1) cost 30 credits, Advanced models (GPT-4o & Claude 3.7) cost 20 credits, and standard models cost 2 credits per run.
  2. You can drop the credit cost to 1 by providing your own API key under the credentials page

In summary, the Analyze Image node helps extract meaning and information from images using powerful AI vision models.