Node Inputs

  • URL: This input asks for the web address (like https://www.example.com) from which you want to extract information. It’s a basic text field where you should enter the full URL.

  • Actions: This is a complex input that allows you to define a sequence of actions the web browsing agent should perform. You’re able to provide 3 pieces of info to actions:

    • Attribute name: This is the name of the webpage element (like a ‘class’ name or ‘id’) you want the action to be performed on.
    • Attribute value: This is the specific value of the attribute you’re targeting (e.g., the name of the class or id).
    • Action type: Here, you specify what action to take, such as clicking, hovering over an element, scrolling, or extracting (scraping) content from the webpage.
  • Note: Actions like scroll, get url, get all urls, scrape source and scrape do not require attribute names or values.

Node Output

  • Scraped URL: This output provides the web address that was extracted, showing you where the scraping action was performed.
  • Website Content: This output contains the actual content extracted from the website once all of the actions have been executed. It could be text, source code, or URLs, depending on the actions defined.

Node Functionality

The Web Agent Scraper is designed to interact with web pages just like a human would, but automatically. By specifying a series of actions, such as clicking on links, hovering over elements, or directly extracting content, it can perform complex sequences of interactions to reach and extract the information you need from websites. Actions you defined are performed sequentially.

When To Use

Use this node whenever you need to automate the process of gathering information from websites. This can range from simple tasks like retrieving the text from a page to more complex sequences where navigation through multiple pages or interactions are necessary to access the content.

This node is particularly useful in scenarios where the desired information is not available directly through a simple URL visit but requires some form of interaction with the website, like clicking through menus, accepting cookies, or scrolling through infinite loading content.