Advanced Topics

LLM-Ready Page Digests

GetLLMDigest and SaveLLMDigest turn the current page into cleaned markdown for LLM input. Cleanup rules -- what counts as junk, where the main content lives, which post-conversion passes run -- come from LLMDigestRules.yaml so they can be tuned per site without recompiling.

GetLLMDigest and SaveLLMDigest

GetLLMDigest strips scripts, navigation, ads, and other boilerplate from the current page, finds the main content, and converts it to markdown, returning an LLMDigestResult with Markdown, Title, Url, OriginalLength, CleanedLength, Success, Message, and RuleSetUsed. SaveLLMDigest does the same and writes the result straight to a GPALFile, prefixed with a short header.

LLMDigestResult digest;

GPAL.Browser

.GoTo("https://example.com/article")

.GetLLMDigest(out digest);

Console.WriteLine($"{digest.Title} ({digest.CleanedLength} of {digest.OriginalLength} bytes)");

Console.WriteLine(digest.Markdown);

// Or write the digest straight to a file

GPAL.Browser

.GoTo("https://example.com/article")

.SaveLLMDigest(GPAL.FileFor("article-digest.md"));

Choosing a Rule Set

Each digest is built using one named DigestRuleSet from LLMDigestRules.yaml. If you pass a ruleSetName to GetLLMDigest or SaveLLMDigest, that rule set is used. Otherwise GPAL checks the current page URL against each rule set's domainPatterns and uses the first match. Either way, GPAL publishes an INFO event naming the rule set it used, and the same name is returned on digest.RuleSetUsed so a workflow can confirm or branch on it.

// See which rule set GPAL picked, and why

GPAL.InformationHandler += (sender, e) => Console.WriteLine(e.Message);

LLMDigestResult digest;

GPAL.Browser

.GoTo("https://www.indeed.com/jobs?q=developer")

.GetLLMDigest(out digest);

Console.WriteLine(digest.RuleSetUsed); // "indeed" - matched by domain

// Force a rule set regardless of the page's URL

GPAL.Browser

.GoTo("https://careers.example.com")

.SaveLLMDigest(GPAL.FileFor("careers-digest.md"), "indeed");

TIP

If no ruleSetName is given and the URL matches no domainPatterns, or if a named rule set leaves a field empty, GPAL uses the "generic" rule set's value for that field instead. A rule set only needs to define what makes that site different from generic.

Editing LLMDigestRules.yaml

Rule sets live in LLMDigestRules.yaml so layouts can be tuned without recompiling. If the file does not exist, call DigestRulesConfig.Save() to write a starter with the built-in rule sets. The dedicated Tuning LLMDigestRules.yaml page covers each field in full.

WARNING

Cleanup step names and XPath expressions are matched literally. A typo in a step name is silently skipped, and an invalid XPath expression is skipped for that page rather than stopping the digest.