Node Inputs
Required Fields
- URL: Web address to scrape Example: “https://www.gumloop.com/“
Optional Fields
- Use Advanced Scraping: Enable this option to use advanced scraping techniques that utilize residential proxies. This helps to avoid common blocks and restrictions imposed by websites, ensuring more reliable and thorough data extraction.
- Timeout: Maximum time (in seconds) to wait for the website to respond before the request is considered failed. This helps to handle slow-loading pages and avoid unnecessary delays.
Example:30
for a 30-second timeout.
Node Output
- Website Content: Extracted text and data
Node Functionality
The Website Scraper node:- Visits web pages
- Extracts readable content
- Handles various content types
- Bypasses common restrictions
- Supports batch processing
Common Use Cases
- Content Collection:
- Data Monitoring:
- Information Gathering:
Loop Mode Pattern
Relevant Templates
To get started quickly with website scraping, use one of these ready-made templates:- Scrape YC Directory
- Scrape and Categorize Lead Websites
- LinkedIn Company Page Scraper
- Real Estate Listing Data Extractor
Important Considerations
- URLs must include https:// or http://