This document explains the Website Scraper node, which extracts content from web pages.

Node Inputs

Required Fields

Optional Fields

  • Use Advanced Scraping: Enable this option to use advanced scraping techniques that utilize residential proxies. This helps to avoid common blocks and restrictions imposed by websites, ensuring more reliable and thorough data extraction.

Node Output

  • Website Content: Extracted text and data

Node Functionality

The Website Scraper node:

  • Visits web pages
  • Extracts readable content
  • Handles various content types
  • Bypasses common restrictions
  • Supports batch processing

Common Use Cases

  1. Content Collection:
Input: Blog URLs
Output: Article content
Use: Research, analysis
  1. Data Monitoring:
Input: Product pages
Output: Pricing, details
Use: Market research
  1. Information Gathering:
Input: News sites
Output: Latest updates
Use: News aggregation

Loop Mode Pattern

Input: List of URLs
Process: Scrape each site
Output: Content from each URL

Important Considerations

  • URLs must include https:// or http://

In summary, the Website Scraper node helps you automatically collect web content, with options for handling both simple and restricted websites.