Web Scraping
Web Agent Scraper
This document explains the Web Agent Scraper node, which automates web interactions and content extraction.
Node Inputs
Required Fields
-
URL: Starting web address
Example: “https://www.gumloop.com/“
-
Actions: Sequence of interactions to perform
Optional Fields
-
Use Advanced Scraping: Enable for restricted sites (more robust, but slower)
Note: Costs additional credits
Node Outputs
- Scraped URL: Final page URL after actions
- Website Content: Extracted content/data
Available Actions
Navigation Actions
-
Click:
- Requires attributes
- Clicks specific element
-
Hover:
- Requires attributes
- Moves cursor over element
-
Scroll:
- No attributes needed
- Scrolls full page
Input Actions
-
Write:
- Requires attributes
- Types text into field
-
Select from Dropdown:
- Requires attributes
- Chooses option
-
Wait:
- Pauses execution (ms)
Collection Actions
-
Screenshot:
- Two types:
- Screenshot (screen visible in viewport)
- Screenshot full page
- Two types:
-
Scrape:
- Scrapes the provided URL
-
Scrape Source:
- No attributes needed
- Gets HTML source
-
Get URL:
- Returns current URL
-
Get All URLs:
- All links available on the provided page
Output: Links separated by commas
-
Get Link by Label:
- This is the label or text that you want to search for on the page that you want to click on. We look for the first link element that contains this text. (Case sensitive)
-
Get All Components by Label:
- This is the value of the HTML attribute that you want the action to be performed on. For example a class name, id value, etc.
Important Considerations
- Actions run in sequence
- Advanced scraping adds 10 credits to the cost of the node
In summary, the Web Agent Scraper node automates complex web interactions to collect content that requires multiple steps to access.