GetLLMDigest and SaveLLMDigest turn the current page into cleaned markdown for LLM input. Cleanup rules -- what counts as junk, where the main content lives, which post-conversion passes run -- come from LLMDigestRules.yaml so they can be tuned per site without recompiling.
GetLLMDigest strips scripts, navigation, ads, and other boilerplate from the current page, finds the main content, and converts it to markdown, returning an LLMDigestResult with Markdown, Title, Url, OriginalLength, CleanedLength, Success, Message, and RuleSetUsed. SaveLLMDigest does the same and writes the result straight to a GPALFile, prefixed with a short header.
LLMDigestResult digest;
GPAL.Browser
.GoTo("https://example.com/article")
.GetLLMDigest(out digest);
Console.WriteLine($"{digest.Title} ({digest.CleanedLength} of {digest.OriginalLength} bytes)");
Console.WriteLine(digest.Markdown);
// Or write the digest straight to a file
GPAL.Browser
.GoTo("https://example.com/article")
.SaveLLMDigest(GPAL.FileFor("article-digest.md"));
Each digest is built using one named DigestRuleSet from LLMDigestRules.yaml. If you pass a ruleSetName to GetLLMDigest or SaveLLMDigest, that rule set is used. Otherwise GPAL checks the current page URL against each rule set's domainPatterns and uses the first match. Either way, GPAL publishes an INFO event naming the rule set it used, and the same name is returned on digest.RuleSetUsed so a workflow can confirm or branch on it.
// See which rule set GPAL picked, and why
GPAL.InformationHandler += (sender, e) => Console.WriteLine(e.Message);
LLMDigestResult digest;
GPAL.Browser
.GoTo("https://www.indeed.com/jobs?q=developer")
.GetLLMDigest(out digest);
Console.WriteLine(digest.RuleSetUsed); // "indeed" - matched by domain
// Force a rule set regardless of the page's URL
GPAL.Browser
.GoTo("https://careers.example.com")
.SaveLLMDigest(GPAL.FileFor("careers-digest.md"), "indeed");
If no ruleSetName is given and the URL matches no domainPatterns, or if a named rule set leaves a field empty, GPAL uses the "generic" rule set's value for that field instead. A rule set only needs to define what makes that site different from generic.
Rule sets live in LLMDigestRules.yaml so layouts can be tuned without recompiling. If the file does not exist, call DigestRulesConfig.Save() to write a starter with the built-in rule sets. The dedicated Tuning LLMDigestRules.yaml page covers each field in full.
Cleanup step names and XPath expressions are matched literally. A typo in a step name is silently skipped, and an invalid XPath expression is skipped for that page rather than stopping the digest.
Showing off some plain text in these paragraphs eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Quo veniam mollitia excepturi animi eum illum non libero sapiente provident assumenda, delectus voluptatum nobis sed dolorem adipisci laudantium incidunt. Error, ratione?
Lorem ipsum dolor sit amet consectetur adipisicing elit. Quo veniam mollitia excepturi animi eum illum non libero sapiente provident assumenda, delectus voluptatum nobis sed dolorem adipisci laudantium incidunt. Error, ratione?
Lorem ipsum dolor sit amet consectetur adipisicing elit. Quo veniam mollitia excepturi animi eum illum non libero sapiente provident assumenda, delectus voluptatum nobis sed dolorem adipisci laudantium incidunt. Error, ratione?
Lorem ipsum dolor sit amet consectetur adipisicing elit. Quo veniam mollitia excepturi animi eum illum non libero sapiente provident assumenda, delectus voluptatum nobis sed dolorem adipisci laudantium incidunt. Error, ratione?
Here you can find different accents and emphasis sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
This is a link and how it could look like bestlinkinthebeautifulworld. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Here's just some classic bold text adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam notBoldSecondbestlinkinthebeautifulworld illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Obcaecati, iste distinctio veritatis eligendi laboriosam adipisicing elit illo nostrum corporis at adipisicing elit libero vel voluptas? Expedita, adipisicing facere dolores voluptatem ad ab rem assumenda soluta!
Other cuple of colors in case we want to emphasize several ways adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam adipisicing elit illo nostrum corporis at voluptatem libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta! Lorem ipsum dolor, sit amet consectetur adipisicing elit. Quod veniam, quam ad expedita laborum sed at voluptates culpa ipsam ut vel. Ullam temporibus a mollitia quod aliquam ratione exercitationem nesciunt.
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta! Lorem ipsum dolor, sit amet consectetur adipisicing elit. Quod veniam, quam ad expedita laborum sed at voluptates culpa ipsam ut vel. Ullam temporibus a mollitia quod aliquam ratione exercitationem nesciunt.
Lorem ipsum dolor sit amet consectetur adipisicing elit. Obcaecati, iste distinctio veritatis eligendi laboriosam illo nostrum corporis at libero vel voluptas? Expedita, facere dolores voluptatem ad ab rem assumenda soluta!
Lorem ipsum dolor sit amet consectetur adipisicing elit. Repudiandae quas consequuntur illo numquam assumenda autem exercitationem distinctio perspiciatis in natus. Eius dicta similique ipsam ipsa minima, nemo quae enim tempore.
GPAL
.CallIfNotFound(GenericCallIfNotFound)
.WithPublishToConsole();
//System.Drawing.Rectangle windowSize = new System.Drawing.Rectangle(10, 10, 1500, 1024);
// NOTE: we have to set browser = before we execute any steps
// this is due to the 'GenericCallIfNotFound' which might throw an exception, and BankScraper will not have the browser set when it calls scraper.Close()
// until the complete fluent line gets executed (meaning every step, meaning browser is not set until everything else succeeds)
browser = GPAL.Browser
.WithBrowserType(Enums.BrowserType.Chrome)
.WithProfileDataDirectory(ChromeProfileLocation)
.WithUseAutomationEngine(AutomationEngine.Selenium)
.WithWindowSize(new System.Drawing.Rectangle(0,0,1920,1080))
.ToGPALObject();