Skip to main content
The Website Scraper node is Gumloop’s unified web scraping solution that handles both basic content extraction and interactive browser automation in a single node. Whether you need to scrape static pages or interact with dynamic websites, this node has you covered.
Alt text

Quick Overview

Base Cost

1 credit for basic scraping

Web Agent Mode

10 credits for interactive actions

Loop Mode

Fully supported for batch processing

Output Format

Plain text content & URLs

Two Modes of Operation

  • Basic Scraping (1 credit)
  • Web Agent Mode (10 credits)
What it does:
  • Extracts readable text from web pages
  • Handles static HTML content
  • Processes standard websites efficiently
Best for:
  • Blog posts and articles
  • Public web pages
  • Simple data extraction
  • Cost-sensitive projects
How to use: Simply provide a URL—no additional configuration needed

Configuration

Required Input

URL (String)
  • The web address you want to scrape or interact with
  • Example: https://www.gumloop.com/

Optional Parameters

Purpose: Enables interactive browser automation (Web Agent mode)When to enable:
  • Content requires clicking, scrolling, or typing
  • Need to navigate through multi-step processes
  • Want to take screenshots
  • Must interact with dynamic elements
What it unlocks:
  • Actions parameter (configure browser interactions)
  • Scraped URL output (get the final URL after actions)
Cost impact: Adds +9 credits to the base cost (total: 10 credits)
Availability: Only appears when “Take Action on Site?” is enabledPurpose: Define a sequence of actions for the browser agent to performAvailable Actions:
  1. click - Click on an element
  2. hover - Hover over an element
  3. scroll - Scroll the page
  4. write - Type text into a field
  5. wait - Pause for a specified duration
  6. screenshot - Capture visible area
  7. screenshot - full page - Capture entire page
  8. screenshot - full page mobile - Capture full page in mobile view
  9. scrape - Extract content
  10. scrape raw HTML - Get raw HTML
  11. get url - Get current URL
  12. get all urls - Extract all URLs on page
  13. get link by label - Find link by its text
Best Practice: Always end your action sequence with a scraping or URL extraction action to ensure you get usable output.
Purpose: Uses residential proxies for better access to restricted sitesWhen to enable:
  • Website blocks standard scrapers
  • Experiencing rate limiting or IP blocks
  • Need higher reliability on protected sites
Cost impact:
  • Basic mode: +1 credit (total: 2 credits)
  • Web Agent mode: +10 credits (total: 20 credits)
Test standard scraping first before enabling this option—it significantly increases credit costs when combined with Web Agent mode.
Purpose: Maximum wait time (in seconds) before considering the request failedDefault: 300 seconds (5 minutes)When to adjust:
  • Increase for complex multi-step processes or very slow sites
  • Decrease if you want faster failure detection
Example: Set to 60 for a 1-minute timeout

Output

  • Website Content
  • Scraped URL
Always available in both modesReturns the scraped text content from the webpage, including:✅ Main text content and article body
✅ Readable elements and structured data
✅ Clean text extraction
❌ Excludes JavaScript code, CSS styling, and hidden elements

Common Use Cases

  • Basic Content Extraction
  • Interactive Scraping
  • Screenshot Capture
  • Lead Enrichment
Scenario: Research industry trendsWorkflow:
Web Search → Website Scraper → Ask AI → Google Sheets Writer
Configuration:
  • Take Action: Disabled
  • Advanced Scraping: Disabled
Credit cost: ~13-23 credits for 10 results

Using Loop Mode

Process multiple URLs efficiently with Loop Mode for batch scraping or automation.

Provide a list of URLs

Input an array of URLs instead of a single URL
[
  "https://example.com/page1",
  "https://example.com/page2",
  "https://example.com/page3"
]

Configure your scraping mode

Choose between:
  • Basic scraping (1 credit each) for simple content extraction
  • Web Agent mode (10 credits each) for interactive tasks
All URLs will use the same configuration and actions.

Understand concurrency limits

Your plan determines parallel processing capacity:
PlanConcurrent Operations
Free2
Solo5
Team15
EnterpriseCustom

Handle results

The node returns arrays of results, maintaining input order:
  • Array of Website Content (one per URL)
  • Array of Scraped URLs (if Take Action enabled)
Best Practice: Wrap in Error Shield to handle individual failures gracefully without stopping the entire batch.

Integration Patterns

Search + Scrape

Web Search → Website ScraperFind relevant pages, then extract their content

Scrape + Extract

Website Scraper → Extract Data (AI)Scrape content, then extract structured information with AI

Agent + Analysis

Website Scraper (Agent) → Ask AIPerform interactions, then analyze the results

Batch + Storage

Sheets Reader → Website Scraper (Loop) → Sheets WriterRead URLs from spreadsheet, scrape all, save results

Best Practices

Use Basic Mode (1 credit) when:
  • Scraping static HTML pages
  • Content is immediately available
  • No user interaction required
  • Cost efficiency is important
Use Web Agent Mode (10 credits) when:
  • Content loads dynamically via JavaScript
  • Need to click, type, or navigate
  • Taking screenshots
  • Extracting URLs after interactions
  • Always ensure URLs include https:// or http://
  • Use Text Formatter to add protocol if missing
  • Filter out empty or invalid URLs before scraping
  • Test with a single URL before running large batches
  • Wrap Website Scraper in Error Shield for production workflows
  • Especially critical in Loop Mode where one failure can affect all results
  • Plan alternate logic paths for failed scrapes
  • Monitor flow history to identify problematic URLs
When using Web Agent mode:
  • Always end with a scraping or URL action to get usable output
  • Add wait actions after clicks to allow content to load
  • Use hover before click if dropdown menus are involved
  • Test action sequences with single URLs first
  • Use basic scraping whenever possible (1 credit vs 10)
  • Only enable Advanced Scraping when you encounter blocking issues
  • Test without Advanced Scraping first
  • Monitor credit consumption for large Loop Mode batches
  • Default 5 minutes is suitable for most use cases
  • Increase for complex multi-step Web Agent workflows
  • Decrease if you want faster failure detection
  • Balance between reliability and execution speed

Troubleshooting

Problem: The node returns an “Invalid URL” errorSolution: Ensure the URL includes the protocol prefixExamples:
  • www.example.com
  • example.com
  • https://www.example.com
  • http://www.example.com
Problem: The scrape times out before completingSolutions:
  1. Increase the timeout value (try 600 seconds for complex workflows)
  2. Verify the website is accessible from your browser
  3. Check if the site has slow response times
  4. For Web Agent mode, ensure actions aren’t waiting indefinitely
  5. Try enabling Advanced Scraping for better reliability
Problem: The scraped content is missing or incompleteSolutions:
  1. Enable “Take Action on Site?” if content loads dynamically
  2. Add wait actions to allow JavaScript to execute
  3. Enable Advanced Scraping for better content extraction
  4. Check if the content requires login or authentication
  5. Use screenshot action to visually debug what the agent sees
Problem: Website blocks or restricts accessSolutions:
  1. Enable Advanced Scraping for residential proxy support
  2. Add wait actions between interactions
  3. Verify the website allows automated access (check robots.txt)
  4. Check if the site requires authentication
  5. Consider whether the scraping violates terms of service
Problem: Actions fail to complete or produce expected resultsSolutions:
  1. Add wait actions after clicks to allow content to load
  2. Use screenshot action to debug what the agent sees
  3. Verify element selectors are correct
  4. Check if the site structure has changed
  5. Ensure actions are in the correct sequence
  6. End with a scrape or get URL action to capture output
Problem: Some URLs fail and affect the entire batchSolutions:
  1. Wrap Website Scraper in Error Shield node
  2. Test individual problematic URLs separately
  3. Filter invalid URLs before processing
  4. Check concurrency limits for your plan
  5. Review flow history to identify failure patterns

Ready-Made Templates

Get started quickly with these pre-built scraping workflows:
Note about Web Agent Scraper: This standalone node has been merged into Website Scraper. Enable “Take Action on Site?” to access the same functionality at the same 10-credit cost.