This document explains the Website Scraper node, which extracts content from web pages.

Node Inputs

Required Fields

Optional Fields

  • Use Advanced Scraping: Enable this option to use advanced scraping techniques that utilize residential proxies. This helps to avoid common blocks and restrictions imposed by websites, ensuring more reliable and thorough data extraction.
  • Timeout: Maximum time (in seconds) to wait for the website to respond before the request is considered failed. This helps to handle slow-loading pages and avoid unnecessary delays.
    Example: 30 for a 30-second timeout.

Node Output

  • Website Content: Extracted text and data

Node Functionality

The Website Scraper node:

  • Visits web pages
  • Extracts readable content
  • Handles various content types
  • Bypasses common restrictions
  • Supports batch processing

Common Use Cases

  1. Content Collection:
Input: Blog URLs
Output: Article content
Use: Research, analysis
  1. Data Monitoring:
Input: Product pages
Output: Pricing, details
Use: Market research
  1. Information Gathering:
Input: News sites
Output: Latest updates
Use: News aggregation

Loop Mode Pattern

Input: List of URLs
Process: Scrape each site
Output: Content from each URL

Relevant Templates

To get started quickly with website scraping, use one of these ready-made templates:

These templates are designed to simplify common scraping tasks and can be customized to fit your specific requirements.

Important Considerations

  • URLs must include https:// or http://

In summary, the Website Scraper node helps you automatically collect web content, with options for handling both simple and restricted websites.