Collect data by stepping through paginated web pages

When working with paginated web pages, you may want to collect data from each page individually and aggregate it yourself. Use the navigate_to_next_page method on the PaginationInfo object returned by the get_pagination_info method.

Overview

This guide shows how to use the navigate_to_next_page method to step through paginated web pages and collect data till reaching a fixed number of data.

Writing the query

For this guide, the goal is to query the information of the first 50 books showed up on a online bookstore.

First, you need to write a query that extracts the book names, prices, and ratings.

{
    books[] {
        name
        price
        rating
    }
}

Stepping through paginated pages

To acquire the first 50 books, you need to step through each paginated page, collect, and aggregate the data while keeping track of the total count of books collected. Here's how you could step through the pages:

step_through_paginated_pages.py
python
with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
    page = agentql.wrap(browser.new_page())
    page.goto("https://books.toscrape.com/")

    # get the pagination info from the current page
    pagination_info = page.get_pagination_info()

    # attempt to navigate to next page
    if pagination_info.has_next_page:
        pagination_info.navigate_to_next_page()

The get_pagination_info method returns a PaginationInfo object, which contains the pagination information of the current page. The PaginationInfo object has a has_next_page property that indicates whether there is a next page. If there is a next page, you can call the navigate_to_next_page method to navigate to the next page.

Internally, the get_pagination_info method attempts to identify the operable element for pagination. The has_next_page property returns True if it finds a clickable element. navigate_to_next_page attempts to click the identified element.

Create a loop

To collect the first 50 books, create a loop that keeps track of the total number of books collected and stops when reaching the target number.

step_through_paginated_pages.py
python
books = []

# Aggregate the first 50 book names, prices and ratings
while len(books) < 50:
    # collect data from the current page
    response = page.query_data(QUERY)

    # limit the total number of books to 50
    if len(response["books"]) + len(books) > 50:
        books.extend(response["books"][:50 - len(books)])
    else:
        books.extend(response["books"])

    # get the pagination info from the current page
    pagination_info = page.get_pagination_info()

    # attempt to navigate to next page
    if pagination_info.has_next_page:
        pagination_info.navigate_to_next_page()