The Speech to Text node converts audio files into written text using AI transcription technology. This documentation covers everything you need to know about using this node effectively in your Gumloop workflows.

Node Inputs

Required Fields

  • Audio File: The audio recording to be transcribed
    • Supported Formats: mp3, mp4, mpeg, mpga, m4a, wav, webm
    • Maximum File Size: 25MB

Optional Fields

  • Use Link: When enabled, allows you to provide a URL to the audio file instead of uploading
    • Link: The URL of the audio file to transcribe (required if “Use Link” is enabled)
  • Model: The AI system used for transcription
    • Current available model: OpenAI Whisper

Node Output

  • Transcript: The transcribed text content from the audio file

Node Functionality

The Speech to Text node leverages OpenAI’s Whisper model to:

  • Convert spoken words in audio files into accurate text
  • Maintain proper punctuation and capitalization
  • Recognize and transcribe multiple languages
  • Process various audio formats and quality levels
  • Support batch processing via Loop Mode

OpenAI Whisper Limitations

The current implementation uses OpenAI’s Whisper model, which has the following limitations:

  1. File Size Restriction:

    • Maximum file size: 25MB
  2. Accuracy Factors:

    • Audio quality affects transcription accuracy
    • Background noise can reduce precision
    • Heavy accents may result in lower accuracy
    • Technical or specialized terminology may not be recognized correctly

When to Use

The Speech to Text node is ideal when you need to:

Convert Recordings

  • Meeting recordings for documentation
  • Interview audio for research
  • Voice notes for personal productivity
  • Lecture content for educational materials

Create Documentation

  • Generate meeting minutes automatically
  • Create searchable interview transcripts
  • Produce podcast transcripts for accessibility
  • Develop text-based course materials

Process Audio Content

  • Extract information from audio messages
  • Make audio content searchable
  • Prepare data for sentiment analysis
  • Archive spoken information in text format

Common Use Cases

Use CaseInputOutputBusiness Value
Meeting DocumentationWeekly meeting recording (MP3/MP4)Complete text transcriptSearchable records, easier follow-up, team alignment
Content RepurposingPodcast or video contentText for articles, blog postsContent multiplier, improved SEO, wider audience reach
Customer ResearchInterview recordingsText for analysisEasier pattern recognition, quote extraction, theme identification
Legal DocumentationRecorded statementsWritten documentationCompliance records, searchable archives, supporting evidence

Example Flow: Podcast Content Repurposing

This flow automatically:

  1. Reads podcast audio files
  2. Transcribes them to text
  3. Creates a summary and extracts key topics
  4. Outputs to Google Docs and Airtable for content planning

Credits Usage

  • The Speech to Text node consumes 20 credits per run
  • Loop Mode processing uses credits for each file processed

In summary, the Speech to Text node provides powerful audio transcription capabilities, making your audio content accessible, searchable, and actionable within your Gumloop workflows.