Collect data by stepping through paginated web pages
When working with paginated web pages, you may want to collect data from each page individually and aggregate it yourself. Use the navigate_to_next_page
method on the PaginationInfo
object returned by the get_pagination_info
method.
Overview
This guide shows how to use the navigate_to_next_page
method to step through paginated web pages and collect data till reaching a fixed number of data.
Writing the query
For this guide, the goal is to query the information of the first 50 books showed up on a online bookstore.
First, you need to write a query that extracts the book names, prices, and ratings.
Stepping through paginated pages
To acquire the first 50 books, you need to step through each paginated page, collect, and aggregate the data while keeping track of the total count of books collected. Here's how you could step through the pages:
The get_pagination_info
method returns a PaginationInfo
object, which contains the pagination information of the current page. The PaginationInfo
object has a has_next_page
property that indicates whether there is a next page. If there is a next page, you can call the navigate_to_next_page
method to navigate to the next page.
Internally, the get_pagination_info
method attempts to identify the operable element for pagination. The has_next_page
property returns True
if it finds a clickable element. navigate_to_next_page
attempts to click the identified element.
Create a loop
To collect the first 50 books, create a loop that keeps track of the total number of books collected and stops when reaching the target number.