Browser

Data Retrieval

Methods for extracting data from the current page - as a grid, raw HTML, sitemap URLs, Next.js hydration data, or an LLM-friendly page digest.

Methods for extracting data from the current page - as a grid, raw HTML, sitemap URLs, Next.js hydration data, or an LLM-friendly page digest.

GetGrid retrieves the matched selector data into an IGPALGrid<string> for further processing. GetPageSource extracts the full HTML of the current page. GetSiteMap retrieves the sitemap XML and GetSiteMapUrls returns the individual URLs as a list. GetHydratedData extracts Next.js hydration data embedded in the page. GetLLMDigest produces a cleaned, structured summary of the page content optimized for use as LLM input. The Save variants write these outputs directly to files.

Examples

GPAL Fluent: High-level fluent C# API

//GetGrid populates the out parameter with a grid where each row corresponds to one matched element. Columns in the grid correspond to the element attributes GPAL extracts based on the selector configuration.

// Extract a table to a grid var grid = GPAL.Grid.ToGPALObject(); GPAL.Browser .GoTo("https://example.com/data") .WithSelector(".data-row") .WithAllThatMatch(1000) .GetGrid(out grid); // Get page HTML source string html; GPAL.Browser .GoTo("https://example.com") .GetPageSource(out html); // Get all URLs from a sitemap List<string> urls; GPAL.Browser .GoTo("https://example.com") .GetSiteMapUrls(out urls);