This tutorial links three Units of Work into one workflow: an input grid drives a search box, CallAfterFillIn reacts to each row by clicking search and finding result links, and CallIfFound opens each result to scrape details into a master grid. A persistent selector runs alongside to dismiss popups whenever they appear.
Here is the whole chained workflow. An input grid feeds a search box, CallAfterFillIn reacts to each row, and CallIfFound opens each matching result to scrape its details into a master grid. Each piece is broken down below.
using System;
using System.Collections.Generic;
using GenerallyPositive;
using GenerallyPositive.Browser;
using static GenerallyPositive.Enums;
namespace InputGridChaining
{
public class Workflows
{
static int itemsToGet = 3;
static int pagesToGet = 1;
static IGPALGrid<string> retGrid;
static Selectors Selectors;
static IGPALGrid<string> inputGrid = GPAL.Grid.ToGPALObject();
public static void StartWorkflow()
{
retGrid = GPAL.GridForType<string>();
Selectors = new Selectors();
inputGrid.AddRow(new List<string>() { "first search term" });
inputGrid.AddRow(new List<string>() { "second search term" });
IBrowser browser = GPAL.Browser
.WithBrowserType(BrowserType.Chrome)
.WithUseAutomationEngine(AutomationEngine.PuppeteerPort)
.ToGPALObject();
browser
.WithPersistentSelector(Selectors.popupSelector);
browser
.Get("https://example.com/")
.WithSelector(Selectors.searchInput)
.CallAfterFillIn(OnRowFilledIn)
.FillInFrom(inputGrid);
browser
.WithGridToSave(retGrid)
.WithHeader("Title")
.WithHeader("Price")
.WithHeader("Link")
.SaveToTabbedText(@"c: emp esults.txt")
.Close(true);
}
public static CallIfStatus OnRowFilledIn(IBrowser browser, IGPALGrid<string> tokens, int tokenIdx)
{
browser
.WithSelector(Selectors.searchButton)
.LeftClick()
.WaitFor(1500);
browser
.WithSelector(Selectors.resultLinks)
.WaitFor(5000)
.CallIfFound(OnResultsFound)
.WithAllThatMatch(itemsToGet)
.WithNextPageButton(Selectors.nextPageButton)
.WithPages(pagesToGet)
.StartWorkflow();
return CallIfStatus.Handled;
}
public static CallIfStatus OnResultsFound(IBrowser browser, List<GPALElement> foundElements, List<GPALElement> matchedElements, Selector selector, bool matchedAll)
{
foreach (GPALElement element in matchedElements)
{
string href = element.GetAttribute("href");
element.MiddleClick();
browser.NextTab();
browser
.WithSelector(Selectors.titleColumn)
.WithSelector(Selectors.priceColumn)
.WithSelector(() => { return href; })
.GetGrid(out IGPALGrid<string> itemGrid);
browser.CloseTab();
if (0 < itemGrid?.Count())
retGrid.AddRow(itemGrid[0]);
}
return CallIfStatus.Handled;
}
public static CallIfStatus DismissPopup(IBrowser browser, List<GPALElement> foundElements, List<GPALElement> matchedElements, Selector selector, bool matchedAll)
{
foreach (GPALElement element in matchedElements)
element.Click();
return CallIfStatus.Handled;
}
}
}
The whole program is really three small workflows that hand off to each other. The first UOW fills the search box from inputGrid one row at a time. CallAfterFillIn fires after each row is typed in, and inside it a second UOW finds the result links. CallIfFound fires when those links are found, and inside it a third block of code opens each result and scrapes its details. Nothing in Main has to wait or poll for any of this - each handler runs exactly when its trigger condition is met.
browser
.Get("https://example.com/")
.WithSelector(Selectors.searchInput)
.CallAfterFillIn(OnRowFilledIn)
.FillInFrom(inputGrid);
FillInFrom walks inputGrid row by row, typing each row's tokens into the matched selector. CallAfterFillIn is registered before FillInFrom is called, so it fires once per row as soon as that row's tokens have been entered.
OnRowFilledIn runs once per row of inputGrid. It clicks the search button, waits for the page to settle, and then opens a new UOW on the result links. WithAllThatMatch caps how many matches are processed, WithNextPageButton and WithPages let this new UOW page through multiple result pages, and StartWorkflow kicks the whole inner workflow off. CallIfFound is registered on this UOW before StartWorkflow runs, so it fires for every matched link on every page.
public static CallIfStatus OnRowFilledIn(IBrowser browser, IGPALGrid<string> tokens, int tokenIdx)
{
browser
.WithSelector(Selectors.searchButton)
.LeftClick()
.WaitFor(1500);
browser
.WithSelector(Selectors.resultLinks)
.WaitFor(5000)
.CallIfFound(OnResultsFound)
.WithAllThatMatch(itemsToGet)
.WithNextPageButton(Selectors.nextPageButton)
.WithPages(pagesToGet)
.StartWorkflow();
return CallIfStatus.Handled;
}
OnResultsFound is the third link in the chain. It receives the list of matched result elements and loops over them. For each one it grabs the href attribute, middle-clicks the element to open the result in a new tab, then calls NextTab so GPAL's focus follows the browser to that new tab. From there a fresh UOW pulls the title and price out of the new page, plus the saved href, and GetGrid packages all three into a single-row grid. CloseTab returns focus to the results page before the loop moves to the next element.
public static CallIfStatus OnResultsFound(IBrowser browser, List<GPALElement> foundElements, List<GPALElement> matchedElements, Selector selector, bool matchedAll)
{
foreach (GPALElement element in matchedElements)
{
string href = element.GetAttribute("href");
element.MiddleClick();
browser.NextTab();
browser
.WithSelector(Selectors.titleColumn)
.WithSelector(Selectors.priceColumn)
.WithSelector(() => { return href; })
.GetGrid(out IGPALGrid<string> itemGrid);
browser.CloseTab();
if (0 < itemGrid?.Count())
retGrid.AddRow(itemGrid[0]);
}
return CallIfStatus.Handled;
}
A middle-click or a Ctrl-click opens a new tab in the real browser, but GPAL keeps working against whatever tab it last knew about. Call NextTab so GPAL's selectors target the new tab, and CloseTab afterward so the next iteration lands back on the results page.
Some sites show a one-time popup the first time you interact with the page, and it can block clicks on the elements your main workflow needs. WithPersistentSelector registers a selector that GPAL checks throughout the whole session, independent of the main chain of UOWs. When it matches, its own CallIfFound handler runs - here DismissPopup just clicks whatever matched - and the main workflow never has to know the popup existed.
browser
.WithPersistentSelector(Selectors.popupSelector);
public static CallIfStatus DismissPopup(IBrowser browser, List<GPALElement> foundElements, List<GPALElement> matchedElements, Selector selector, bool matchedAll)
{
foreach (GPALElement element in matchedElements)
element.Click();
return CallIfStatus.Handled;
}
The persistent selector's CallIfFound handler is wired up where the selector is defined, for example: GPAL.Selector.WithCSS("...").CallIfFound(Workflows.DismissPopup).WithSearchForSelector(false). WithSearchForSelector(false) tells GPAL not to spend time looking for it unless it is already on the page, which avoids false positives on pages that never show the popup.
Every row appended to retGrid by OnResultsFound survives across all rows of inputGrid and all pages of results, because retGrid is created once at the start of StartWorkflow. After FillInFrom finishes processing every row, WithGridToSave attaches retGrid to the browser, WithHeader adds one column header per call, and SaveToTabbedText writes the whole grid out as a tab-delimited file. Close(true) then shuts down the browser and its webdriver.
browser
.WithGridToSave(retGrid)
.WithHeader("Title")
.WithHeader("Price")
.WithHeader("Link")
.SaveToTabbedText(@"c: emp esults.txt")
.Close(true);
Showing off some plain text in these paragraphs eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Quo veniam mollitia excepturi animi eum illum non libero sapiente provident assumenda, delectus voluptatum nobis sed dolorem adipisci laudantium incidunt. Error, ratione?
Lorem ipsum dolor sit amet consectetur adipisicing elit. Quo veniam mollitia excepturi animi eum illum non libero sapiente provident assumenda, delectus voluptatum nobis sed dolorem adipisci laudantium incidunt. Error, ratione?
Lorem ipsum dolor sit amet consectetur adipisicing elit. Quo veniam mollitia excepturi animi eum illum non libero sapiente provident assumenda, delectus voluptatum nobis sed dolorem adipisci laudantium incidunt. Error, ratione?
Lorem ipsum dolor sit amet consectetur adipisicing elit. Quo veniam mollitia excepturi animi eum illum non libero sapiente provident assumenda, delectus voluptatum nobis sed dolorem adipisci laudantium incidunt. Error, ratione?
Here you can find different accents and emphasis sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
This is a link and how it could look like bestlinkinthebeautifulworld. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Here's just some classic bold text adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam notBoldSecondbestlinkinthebeautifulworld illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Obcaecati, iste distinctio veritatis eligendi laboriosam adipisicing elit illo nostrum corporis at adipisicing elit libero vel voluptas? Expedita, adipisicing facere dolores voluptatem ad ab rem assumenda soluta!
Other cuple of colors in case we want to emphasize several ways adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam adipisicing elit illo nostrum corporis at voluptatem libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta! Lorem ipsum dolor, sit amet consectetur adipisicing elit. Quod veniam, quam ad expedita laborum sed at voluptates culpa ipsam ut vel. Ullam temporibus a mollitia quod aliquam ratione exercitationem nesciunt.
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta! Lorem ipsum dolor, sit amet consectetur adipisicing elit. Quod veniam, quam ad expedita laborum sed at voluptates culpa ipsam ut vel. Ullam temporibus a mollitia quod aliquam ratione exercitationem nesciunt.
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Repudiandae quas consequuntur illo numquam assumenda autem exercitationem distinctio perspiciatis in natus. Eius dicta similique ipsam ipsa minima, nemo quae enim tempore.
GPAL
.CallIfNotFound(GenericCallIfNotFound)
.WithPublishToConsole();
//System.Drawing.Rectangle windowSize = new System.Drawing.Rectangle(10, 10, 1500, 1024);
// NOTE: we have to set browser = before we execute any steps
// this is due to the 'GenericCallIfNotFound' which might throw an exception, and BankScraper will not have the browser set when it calls scraper.Close()
// until the complete fluent line gets executed (meaning every step, meaning browser is not set until everything else succeeds)
browser = GPAL.Browser
.WithBrowserType(Enums.BrowserType.Chrome)
.WithProfileDataDirectory(ChromeProfileLocation)
.WithUseAutomationEngine(AutomationEngine.Selenium)
.WithWindowSize(new System.Drawing.Rectangle(0,0,1920,1080))
.ToGPALObject();