Node Inputs

  • URL

    • Type: String

    • Description: This is the web address of the page you want to collect information from. For example, ”https://www.gumloop.com/”. It’s important to make sure that the URL starts with http:// or https://.

Node Output

  • Website Content

    • Type: String

    • Description: After the scraper has visited the website, this is the main text it was able to gather. It’s the readable text you’d find when visiting the page, free from any code or scripts.

Node Functionality

The Website Scraper node is a tool that automates the process of collecting information from web pages. Once you provide it with a URL, it will visit that web address as if it were a person using a web browser. It will then read through the page, ignoring any complex web code and scripts, and bring back just the readable text. This node is particularly skilled at properly handling both typical websites and PDF documents. If the web address ends in “.pdf”, the node will specifically process the PDF document and convert its text so that it can be used just like text from a webpage.

When To Use

This node is incredibly useful when you have information on a website that you’d like to monitor, analyze, or archive without having to manually visit the site and select the text yourself. It could be used to track changes in content, gather research material, or even compile data from multiple sites into a single, convenient resource. If you’re looking to save time on repetitive web browsing tasks, or if you need a reliable way to collect web content for your digital projects, the Website Scraper node is an excellent solution.