AgentQL Tools
The agentql.tools
module provides utility methods to help with data extraction and web automation.
The following example demonstrates how to use the paginate
method to collect data from multiple pages:
import agentql
from agentql.tools.sync_api import paginate
from playwright.sync_api import sync_playwright
with sync_playwright() as p, p.chromium.launch(headless=False) as browser:
page = agentql.wrap(browser.new_page())
page.goto("https://news.ycombinator.com/")
# Define the query to extract the titles of the posts
QUERY = """
{
posts[] {
title
}
}
"""
# Collect data from the first 3 pages using the query
paginated_data = paginate(page, QUERY, 3)
print(paginated_data)
Methods
paginate
Collects data from multiple pages using an AgentQL query. Internally, the function first attempts to find the operable element to navigate to the next page and click it, then uses the provided query to extract the data from the page. The function then repeats this process for the specified number of pages.
The paginate
function returns data collected from each page into a single, aggregated list. If you wish to step through each page's data, use the navigate_to_next_page
method instead.
Usage
paginated_data = paginate(page, QUERY, 3)
Arguments
-
page
AgentQL PageThe AgentQL Page object.
-
query
strAn AgentQL query in String format.
-
number_of_pages
intNumber of pages to paginate over.
-
timeout
int (optional):Timeout value in seconds for the connection with backend API service for querying the pagination element.
-
wait_for_network_idle
bool (optional)Whether to wait for network reaching full idle state before querying the page for pagination element. If set to
False
, this method will only check for whether page has emittedload
event. Default isTrue
. -
include_hidden
bool (optional)Whether to include hidden elements on the page when querying for pagination element. Defaults to
False
. -
mode
ResponseMode (optional):The mode of the query for retrieving the pagination element. It can be either
standard
orfast
. Defaults tofast
mode. -
force_click
bool (optional):Whether to
force
click on the pagination element. Defaults toFalse
.
Returns
-
List of dictionaries containing the data from each page.