Scraping data with query_data
Use the query_data
method to extract structured data from a web page, such as product details, user reviews, or other information.
Unlike query_elements
or get_by_prompt
, query_data
doesn't return interactive elements but data.
Overview
This guide shows you how to use query_data
and work with the data output.
Define the data query
First, define an AgentQL query that describes how to structure the data.
For example, the following query scrapes a website for the name
and price
for all products within a product category.
{
product_category
product[] {
name
price
}
}
Run the data query
Within your script, you can now pass your query into the query_data
method.
products_response = page.query_data(PRODUCTS_QUERY)
Understanding the data output
When you run the query, it returns a dictionary containing the retrieved data formatted according to the query schema.
Here's an example of what the query might return:
{
'product_category': "Coffee Beans",
'product': [
{
'name': 'Starbucks Coffee Beans'
'price': '$16.99'
}
{
'name': 'Blue Bottle Coffee Beans'
'price': '$17.99'
}
]
}
Accessing the data output
Finally, you can access any part of the data according to the schema in your script as you would any standard dictionary.
The following snippet includes some common examples using the scenario from this guide:
# Access the product category
category = products_response['product_category']
print(f"Product Category: {category}")
# Access the list of products
products = products_response['product']
# Iterate through the products and print their details
for product in products:
name = product['name']
price = product['price']
print(f"Product: {name}, Price: {price}")
Conclusion
Remember that the query_data
method is ideal for scraping and retrieving data while query_elements
is ideal for interacting with the elements.