This document explains the Analyze Image node, which uses AI vision to extract information and insights from images.

Node Inputs

Required Fields

  • Image File: Upload image or PDF (JPG, PNG, GIF, WEBP or PDF)
  • Prompt: Question or instruction for analysis. Be detailed here for accurate output

Optional Fields

  • Use Link: Enable to use direct image URLs
    • Only supports publicly accessible media links (e.g., https://example.com/image.jpg)
    • Does not support Google Drive, Dropbox, or other file-sharing links
    • URL must point directly to the image file
  • Temperature: Controls analysis creativity (0-1)
    • 0: More focused, consistent
    • 1: More creative, varied
  • Cache Response: Save responses for reuse

Show As Input

The node allows you to configure certain parameters as dynamic inputs. You can enable these in the “Configure Inputs” section:

  • Use Link: Boolean

    • true/false to use image URL instead of file upload
    • When enabled, allows input of publicly accessible image URLs
    • Remember: Only direct media links are supported
  • Prompt: String

    • The specific question or instruction for analyzing the image
    • Example: “Describe the main objects in this image”
  • image_model_preference: String

    • Name of the AI model to use for image analysis
    • Accepted values: “GPT-4o Vision”, “Claude 3 Haiku”, etc.
  • Cache Response: Boolean

    • true/false to enable/disable response caching
    • Helps reduce API calls for identical inputs
  • Temperature: Number

    • Value between 0 and 1
    • Controls analysis consistency and creativity

When enabled as inputs, these parameters can be dynamically set by previous nodes in your workflow. If not enabled, the values set in the node configuration will be used.

Node Output

  • Analysis: AI’s detailed response about the image

Node Functionality

The Analyze Image can:

  • Processes images with AI vision
  • Extracts text from images
  • Generates descriptions
  • Answers queries about content
  • Identifies objects and scenes
  • Can read image-based PDFs

Available AI Models

  • GPT-4o vision
  • GPT-4o mini vision
  • Claude 3.5 Sonnet
  • Claude 3 Haiku
  • Gemini 1.5 Pro
  • Gemini 1.5 Flash

Common Use Cases

  1. Text Extraction:
Prompt: "Extract all text visible in this image"
Use: Scanning documents, reading signs
  1. Visual Description:
Prompt: "Describe this image in detail"
Use: Accessibility, content cataloging
  1. Object Detection:
Prompt: "List all objects in this image"
Use: Inventory, scene analysis

Important Considerations

  1. Advanced models (GPT-4o & Claude 3.5) cost 20 credits, and standard models cost 2 credits per run
  2. You can drop the credit cost to 1 by providing your own API key under the credentials page

In summary, the Analyze Image node helps extract meaning and information from images using powerful AI vision models.