# AgentQL Documentation > AgentQL is a powerful query language and SDK for web automation and data extraction. It combines natural language processing with web technologies to enable precise control over web browsers and efficient data collection. This is the comprehensive version of the documentation, including full content from all documentation files. Important notes about this documentation: - This is the comprehensive version of the AgentQL documentation - Each section includes the full content from its source file - Source URLs are provided for each section - The content is organized hierarchically based on the documentation structure - Code examples and technical details are preserved # Quick Start Source: https://docs.agentql.com/quick-start Get started scraping data and automating interactions with web pages with AgentQL queries, CLI, debugger, and SDKs. This guide shows you how to get started interacting with elements and extracting data from web pages using AgentQL queries. You will learn how to use the Chrome extension for debugging and how to implement queries using AgentQL's Python SDK. ## What's an AgentQL query? AgentQL is a query language and a set of supporting developer tools designed to identify web elements and their data using natural language and return them in the shape you define. Here is an AgentQL query you can use right now to locate the search button on this page: ```AgentQL { search_button } ``` This query can return each heading on this page: ```AgentQL { headings[] } ``` The following shows you how to execute queries to retrieve data and elements from web pages starting right here, with the page you are on now. ## Get your API key You need an API key to make AgentQL queries. Get a free API key on the AgentQL Developer Portal, and you'll be ready to go! ## Query this page with the AgentQL Debugger Chrome extension The AgentQL Debugger lets you write and test queries in real-time on web pages, without needing to spin up the Python SDK. It's perfect for debugging queries before putting them into production! Here's how to get started: 1. Install the AgentQL Debugger from the Chrome Web Store. 2. Come back to this page and open Chrome DevTools (**Ctrl+Shift+I** on Windows/Linux or **Cmd+Opt+I** on Mac). 3. In the top bar of the devtools panel, select "AgentQL." If you don't see the option, click the overflow menu button (») and select "AgentQL" from the list. 4. Enter your API key when prompted. 5. In the query panel, try this query: ```AgentQL { search_button } ``` 7. Click "Fetch Web Elements" to run the query to return the search button element. Look under the **AgentQL tab** to find an entry for the search button. Hovering over the entry highlights its element on the page. ### Try it out The best way to learn how AgentQL works is to play around with it in the extension. Here are some things you can try: - Click the **eye icon** to navigate to the element on the page. - Click **``** to navigate to the element in the devtools. - Click "Fetch Data" to return the contents of the queried elements instead of the elements themselves (great for scraping). - Use `[]` to return a list of items. - Use `()` to add additional context to find the exact element: ```AgentQL { headings(all the headings inside the article)[] } ``` Experiment with different queries to get a feel for how AgentQL works. For example, try this one out with "Fetch Data" to get a list of all the headings on this page: ```AgentQL { headings[] } ``` You can even nest items: ```AgentQL { breadcrumbs { first_item last_item } } ``` ## Perform the query with the AgentQL SDK Now that you're familiar with writing queries, you can use the SDK to run the same query programmatically. If you use virtual environments, we recommend using one for the following steps. 1. In your project folder, install the AgentQL SDK and initialize AgentQL: ```bash pip3 install agentql agentql init ``` 2. Provide your API key. 3. Create a new Python file, `example_script.py`. 4. Add the following code: 1. In your project folder, install the AgentQL SDK and CLI: ```bash npm install agentql npm install -g agentql-cli ``` 2. Install dependencies by running the following command: ```bash agentql init ``` 3. Set the `AGENTQL_API_KEY` environment variable with your API key (https://dev.agentql.com/). To set the environment variable temporarily for your terminal session, in your terminal run ```bash export AGENTQL_API_KEY=your-api-key ``` #### Powershell If you are using **Powershell** as your terminal, you can set the environment variable with the following command ```bash $env:AGENTQL_API_KEY="your-api-key" ``` #### Command Prompt If you are using **Command Prompt** as your terminal, you can set the environment variable with the following command ```bash set AGENTQL_API_KEY=your-api-key ``` 4. Create a new JavaScript file, `example_script.js`. 5. Add the following code: ```python filename="example_script.py" import agentql from playwright.sync_api import sync_playwright # Initialise the browser with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto("https://docs.agentql.com/quick-start") # Find "Search" button using Smart Locator search_button = page.get_by_prompt("search button") # Interact with the button search_button.click() # Define a query for modal dialog's search input SEARCH_BOX_QUERY = """ { modal { search_box } } """ # Get the modal's search input and fill it with "Quick Start" response = page.query_elements(SEARCH_BOX_QUERY) response.modal.search_box.type("Quick Start") # Define a query for the search results SEARCH_RESULTS_QUERY = """ { modal { search_box search_results { items[] } } } """ # Execute the query after the results have returned then click on the first one response = page.query_elements(SEARCH_RESULTS_QUERY) response.modal.search_results.items[0].click() # Used only for demo purposes. It allows you to see the effect of the script. page.wait_for_timeout(10000) ``` ```js filename="example_script.js" const { wrap, configure } = require('agentql'); const { chromium } = require('playwright'); configure({ apiKey: process.env.AGENTQL_API_KEY }); async function main() { const browser = await chromium.launch({headless: false}); const page = await wrap(await browser.newPage()); await page.goto('https://docs.agentql.com/quick-start'); // Find "Search" button using Smart Locator const searchButton = await page.getByPrompt('search button'); // Interact with the button await searchButton.click(); // Define a query for modal dialog's search input const SEARCH_BOX_QUERY = ` { modal { search_box } } ` // Get the modal's search input and fill it with "Quick Start" let response = await page.queryElements(SEARCH_BOX_QUERY); await response.modal.search_box.fill("Quick Start"); // Define a query for the search results const SEARCH_RESULTS_QUERY = ` { modal { search_box search_results { items[] } } } ` // Execute the query after the results have returned then click on the first one response = await page.queryElements(SEARCH_RESULTS_QUERY); await response.modal.search_results.items[0].click(); // Used only for demo purposes. It allows you to see the effect of the script. await page.waitForTimeout(10000); await browser.close(); } main(); ``` 5. Run the script: ```bash python3 example_script.py ``` ```bash node example_script.js ``` This script opens this site, **docs.agentql.com**, clicks the search button, fills in the search modal's input with "Quick Start," and clicks the first result—bringing you back to this page. ## Next steps Congratulations! You've now used AgentQL queries both in the Chrome extension and the SDK. This is the AgentQL workflow: optimizing and debugging queries with the extension before running them in the SDK. Here are some next steps to explore: - Learn more about AgentQL query syntax (/agentql-query/query-intro) - Explore best practices for writing queries (/agentql-query/best-practices) - Check out an example script for collecting YouTube comment data (/getting-started/example-script) Happy querying with AgentQL! # Learn AgentQL Source: https://docs.agentql.com/learn Learn how to use AgentQL to automate web interactions and extract data from web pages in this full length tutorial. ## Overview This guide shows you how to use AgentQL to automate web interactions and extract data from web pages, culminating in building a script for collecting comment data from YouTube. ## Learning path ## Related content ## First Steps Source: https://docs.agentql.com/getting-started/first-steps AgentQL is a robust query language that identifies elements on a webpage using natural language with the help of AI. It can be used to automate tasks on the web, extract data, and interact with websites in real-time. AgentQL uses AI to infer which element or data you mean based on _term names and query structure_, so it will find what you're looking for even if the page's markup and layout change drastically. AgentQL's SDK allows you to write scripts that identify elements and extract data from the web using the AgentQL query language. In this guide, you will learn how to use AgentQL queries and the SDK to automate page interactions and data extraction from the page. ## Prerequisites - The AgentQL SDK (/python-sdk/installation) ## Prerequisites - The AgentQL SDK (/javascript-sdk/installation) ## Instructions The script below will open a browser and do the following: 1. Navigate to **scrapeme.live/shop (https://scrapeme.live/shop)**. 2. Input "fish" into the search field in header section. 3. Press "Enter" key to perform the search. 4. Close the the browser after 10 seconds. Save the following script in a file named **example_script.py** then open a terminal in your project's folder and run the script with `python3 example_script.py`. Save the following script in a file named **example_script.js** then open a terminal in your project's folder and run the script with `node example_script.js`. ```python filename="example_script.py" import agentql from playwright.sync_api import sync_playwright with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto("https://scrapeme.live/shop/") QUERY = """ { search_box } """ response = page.query_elements(QUERY) response.search_box.fill("fish") page.keyboard.press("Enter") # Used only for demo purposes. It allows you to see the effect of script. page.wait_for_timeout(10000) ``` ```js filename="example_script.js" const { wrap, configure } = require('agentql'); const { chromium } = require('playwright'); configure({ apiKey: process.env.AGENTQL_API_KEY }); async function main() { const browser = await chromium.launch(); const page = await wrap(await browser.newPage()); await page.goto('https://scrapeme.live/shop'); const QUERY = ` { search_box } `; const response = await page.queryElements(QUERY); await response.search_box.fill('fish'); await page.keyboard.press('Enter'); // Used only for demo purposes. It allows you to see the effect of the script. await page.waitForTimeout(10000); await browser.close(); } main(); ``` Here's how you can create it step by step: ### Step 0: Create a New Python Script In your project folder, create a new Python script and name it `example_script.py`. ### Step 0: Create a New JS Script In your project folder, create a new JavaScript script and name it `example_script.js`. ### Step 1: Import Required Libraries Import needed functions and classes from `playwright` library and import the `agentql` library. ```python filename="example_script.py" import agentql from playwright.sync_api import sync_playwright ``` ```js filename="example_script.js" const { wrap, configure } = require('agentql'); const { chromium } = require('playwright'); ``` Playwright (https://playwright.dev/) is an end-to-end automation and testing tool that can be used for automation. In this example, it manages open the browser and interacting with the elements AgentQL returns. ### Step 2: Launch the Browser and Open the Website The last preparation step is launching the browser and navigating to the target website. This is done using usual Playwright's API. The only difference is the type of the page — instead of Playwright's `Page` (https://playwright.dev/python/docs/api/class-page) class, it will be wrapped with `agentql.wrap()`, and you will get AgentQL's `Page` (/python-sdk/api-references/agentql-page) class that will be the main interface not only for interacting with the web page but also for executing AgentQL queries. The last preparation step is launching the browser and navigating to the target website. This is done using usual Playwright's API. The only difference is the type of the page — instead of Playwright's `Page` (https://playwright.dev/docs/api/class-page) class, it will be wrapped with `wrap()`, and you will get AgentQL's `Page` (/javascript-sdk/api-references/agentql-page) class that will be the main interface not only for interacting with the web page but also for executing AgentQL queries. ```python filename="example_script.py" with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto("https://scrapeme.live/shop") ``` ```js filename="example_script.js" const browser = await chromium.launch({ headless: false }); const page = await wrap(await browser.newPage()); await page.goto('https://scrapeme.live/shop'); ``` - Default AgentQL SDK implementation is built on top of Playwright and uses all of its functionality for interacting with browser, page and elements on the page. - By default Playwright launches the browser in headless mode. Here it's explicitly set to `False` for the sake of example. ### Step 3: Define AgentQL Query AgentQL queries are how you query elements from a web page. A query describes the elements you want to interact with or consume content from and defines your desired output structure. ```python filename="example_script.py" QUERY = """ { search_box } """ ``` ```js filename="example_script.js" const QUERY = ` { search_box } `; ``` In this query, we specify the element we want to interact with on `"https://scrapeme.live/shop/"`: - `search_box` - search input field - To learn more about AgentQL query syntax, capabilities and best practices, check out our AgentQL Query documentation (/agentql-query/query-intro). - If you want to query just one element on the page, you can use simpler `get_by_prompt` (/python-sdk/api-references/agentql-page#getbyprompt) API that can identify an element by natural language description. - To learn more about AgentQL query syntax, capabilities and best practices, check out our AgentQL Query documentation (/agentql-query/query-intro). - If you want to query just one element on the page, you can use simpler `getByPrompt` (/javascript-sdk/api-references/agentql-page#getbyprompt) API that can identify an element by natural language description. ### Step 4: Execute AgentQL Query AgentQL's `Page` extends Playwright's `Page` class with querying capabilities: ```python filename="example_script.py" response = page.query_elements(QUERY) ``` ```js filename="example_script.js" const response = await page.queryElements(QUERY); ``` `response` variable will have the same structure as defined by the given AgentQL query, i.e. it will have 1 field: `search_box`. This field will either be `None` if described element was not found on the page, or an instance of `Locator` (https://playwright.dev/python/docs/api/class-locator) class that allows you to interact with the found element. `response` variable will have the same structure as defined by the given AgentQL query, i.e. it will have 1 field: `search_box`. This field will either be `None` if described element was not found on the page, or an instance of `Locator` (https://playwright.dev/docs/api/class-locator) class that allows you to interact with the found element. ### Step 5: Interact with Web Page ```python filename="example_script.py" response.search_box.fill("fish") ``` ```js filename="example_script.js" await response.search_box.fill('fish'); ``` This line uses the `fill` method on the `search_box` element found in the previous step. It mimics typing "fish" into the search box. ```python filename="example_script.py" page.keyboard.press("Enter") ``` ```js filename="example_script.js" await page.keyboard.press('Enter'); ``` Here, the `Enter` method is called on the `keyboard` attribute of the page, simulating a press on the `Enter` key. The `fill` method is coming from the Playwright's `Locator` class. To get a full list of methods available for interacting with web elements, please refer to this class's documentation (https://playwright.dev/docs/api/class-locator). ### Step 6: Pause the Script Execution ```python filename="example_script.py" page.wait_for_timeout(10000) ``` ```js filename="example_script.js" await page.waitForTimeout(10000); ``` Here, `page.wait_for_timeout()` method is used to pause the execution for 10 seconds to see the effect of this script before closing the browser. `page.wait_for_timeout()` is used only for demo purposes and will impact the performance. Don't use it in production! Here, `page.waitForTimeout()` method is used to pause the execution for 10 seconds to see the effect of this script before closing the browser. `page.waitForTimeout()` is used only for demo purposes and will impact the performance. Don't use it in production! ### Step 7: Stop the Browser ```python filename="example_script.py" browser.close() ``` ```js filename="example_script.js" await browser.close(); ``` Finally, the `close` method is called on the `browser` object, ending the web browsing session. This is important for properly releasing resources. ### Step 8: Run the Script Open a terminal in your project's folder and run the script: ```bash python3 example_script.py ``` ```bash node example_script.js ``` ## Example Script Source: https://docs.agentql.com/getting-started/example-script In this guide, you will learn how to use an AgentQL script to navigate to a YouTube video and collect comment data. ## Prerequisites - The AgentQL SDK (/python-sdk/installation) ## Prerequisites - The AgentQL SDK (/javascript-sdk/installation) ## Instructions If you'd like to start with the full example script, you can find it at the end of the tutorial (#putting-it-all-together). ### Step 0: Create a New Python Script In your project folder, create a new Python script and name it `example_script.py`. ### Step 1: Import Required Libraries Import needed functions and classes from `playwright` and `agentql` libraries and import the `logging` library. ### Step 0: Create a New JavaScript Script In your project folder, create a new JavaScript script and name it `example_script.js`. ### Step 1: Import Required Libraries Import needed functions and classes from `playwright` and `agentql` libraries. ```python filename="example_script.py" import logging import agentql from playwright.sync_api import sync_playwright ``` ```js filename="example_script.js" const { wrap, configure } = require("agentql"); const { chromium } = require("playwright"); ``` `logging` provides logging, debugging, and information messages, `playwright` provides core browser interaction functionality and `agentql` adds the main AgentQL functionality. `playwright` provides core browser interaction functionality and `agentql` adds the main AgentQL capabilities. ### Step 2: Launch the Browser and Open the Website ```python filename="example_script.py" logging.basicConfig(level=logging.DEBUG) log = logging.getLogger(__name__) URL = "https://www.youtube.com/" with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto(URL) ``` ```js filename="example_script.js" const URL = "https://www.youtube.com/"; const browser = await chromium.launch({ headless: false }); const page = await wrap(await browser.newPage()); ``` - Set up logging: `logging.basicConfig(level=logging.DEBUG)` configures the logging to show debug-level messages. - Define `URL`: The URL `"https://www.youtube.com/"` is the target website for the script. - Start Playwright instance with `sync_playwright()`. - Launch the browser: `playwright.chromium.launch(headless=False)` - Create a new page in the browser and wrap it to get access to the AgentQL's querying API: `agentql.wrap(browser.new_page())`. - Navigate to the website with `page.goto(URL)`. - Define `URL`: The URL `"https://www.youtube.com/"` is the target website for the script. - Launch the browser: `await chromium.launch({ headless: false })`. - Create a new page in the browser and wrap it to get access to the AgentQL's querying API: `await wrap(await browser.newPage())`. - Navigate to the website with `page.goto(URL)`. ### Step 3: Define AgentQL Queries ```python filename="example_script.py" SEARCH_QUERY = """ { search_input search_btn } """ VIDEO_QUERY = """ { videos[] { video_link video_title channel_name } } """ VIDEO_CONTROL_QUERY = """ { expand_description_btn } """ DESCRIPTION_QUERY = """ { description_text } """ COMMENT_QUERY = """ { comments[] { channel_name comment_text } } """ ``` ```js filename="example_script.js" const SEARCH_QUERY = ` { search_input search_btn } `; const VIDEO_QUERY = ` { videos[] { video_link video_title channel_name } } `; const VIDEO_CONTROL_QUERY = ` { expand_description_btn } `; const DESCRIPTION_QUERY = ` { description_text } `; const COMMENT_QUERY = ` { comments[] { channel_name comment_text } } `; ``` These queries provide the tool for communication and interaction with the right elements on the website. Ensuring you have functional and reliable queries is paramount! Use the AgentQL Chrome Extension in parallel with the AgentQL SDK to test different query formats and keywords directly with the webpage! ### Step 4: Try & Except Block ```python filename="example_script.py" try: # query logic except Exception as e: log.error(f"Found Error: {e}") ``` ```js filename="example_script.js" try { // query logic } catch (e) { console.error(e); } ``` ### Step 5: Execute Search Query and Interact with Search Elements ```python filename="example_script.py" response = page.query_elements(SEARCH_QUERY) response.search_input.type("machine learning", delay=75) response.search_btn.click() ``` ```js filename="example_script.js" const response = await page.queryElements(SEARCH_QUERY); await response.search_input.fill("machine learning", delay=75); await response.search_btn.click(); ``` - Search Query: Here we pass the `SEARCH_QUERY` to query specific elements on the page to interact with the search elements on YouTube page. - Type and Click: It types `"machine learning"` into the search input with a delay of 75ms between keystrokes and then clicks `search_btn` More information on the `type()` and other available interaction APIs you can find in the official Playwright documentation (https://playwright.dev/docs/api/class-locator). Use the AgentQL `to_data()` API when you want to convert the AgentQL Response into `dict` of data and work with it rather than treating it as web elements! ### Optional Step: Convert AgentQL response to a Python dict Since the raw response is an AgentQL response object, it can be complicated to work with it. You can use the `to_data()` API to convert the response to a Python `dict`. More information on the `fill()` and other available interaction APIs you can find in the official Playwright documentation (https://playwright.dev/docs/api/class-locator). Use the AgentQL `toData()` API when you want to convert the AgentQL Response into `map` of data and work with it rather than treating it as web elements! ### Optional Step: Convert AgentQL response to a JavaScript Map The raw response is an AgentQL response object, which can be inconvenient to work with it. We recommend using the `toData()` API to convert the response to a JavaScript `Map`. ```python filename="example_script.py" SAMPLE_DATA_QUERY = """ { videos[] { title date_posted views } } """ response = page.query_elements(SAMPLE_DATA_QUERY) log.debug(response.to_data()) ``` ```js filename="example_script.js" const SAMPLE_DATA_QUERY = ` { videos[] { title date_posted views } } `; ``` The `to_data()` API converts the AgentQL response to a structured `dict` in which it replaces the response nodes with text contents of the nodes. The `toData()` API converts the AgentQL response to a structured `Map` in which it replaces the response nodes with text contents of the nodes. Sample result would be as follows: ```json filename="Response Data" { "videos": [ { "title": "This is a nice video!", "date_posted": "1 month ago", "views": "2.7K Views" }, { "title": "This maybe a better video!", "date_posted": "3 months ago", "views": "2.7 Million Views" }, { "title": "This is best video!", "date_posted": "1 year ago", "views": "37.5K Views" } ] } ``` ### Step 6: Execute Video Query and Interact with Video Elements ```python filename="example_script.py" response = page.query_elements(VIDEO_QUERY) log.debug( f"Clicking Youtube Video: {response.videos[0].video_title.text_content()}" ) response.videos[0].video_link.click() # click the first youtube video ``` ```js filename="example_script.js" const response = await page.queryElements(VIDEO_QUERY); console.debug( `Clicking Youtube Video: ${response.videos[0].video_title.textContent}` ); await response.videos[0].video_link.click(); // click the first youtube video ``` - Video Query: The script runs the `VIDEO_QUERY` to interact with video elements on the search results page. - Click on Video: It clicks on the first video link in the results. - Logging: A debug message logs the title of the clicked video. ### Step 7: Control Video Playback and Show Description ```python filename="example_script.py" response = page.query_elements(VIDEO_CONTROL_QUERY) response.expand_description_btn.click() ``` ```js filename="example_script.js" const response = await page.queryElements(VIDEO_CONTROL_QUERY); await response.expand_description_btn.click(); ``` - Video Control Query: Run the `VIDEO_CONTROL_QUERY` to interact with video controls. ### Step 8: Capture and Log Video Description Instead of converting the intermediate AgentQL response to `dict` via `to_data()` API, you can call `page.query_data()` method to query for structured data at the first place. Instead of converting the intermediate AgentQL response to `Map` via `toData()` API, you can call `page.queryData()` method to query for structured data at the first place. ```python filename="example_script.py" response_data = page.query_data(DESCRIPTION_QUERY) log.debug( f"Captured the following description: \n{response_data['description_text']}" ) ``` ```js filename="example_script.js" const response_data = await page.queryData(DESCRIPTION_QUERY); console.debug( `Captured the following description: \n${response_data["description_text"]}` ); ``` - Description Query: Executes the `DESCRIPTION_QUERY`. - Logging Description: Logs the captured description of the video. ### Step 9: Scroll Down the Page to Load Comments To load comments on a YouTube video page, we need to scroll down the page a few times. ```python filename="example_script.py" for _ in range(3): page.keyboard.press("PageDown") page.wait_for_page_ready_state() ``` ```js filename="example_script.js" for (let i = 0; i - Press `PageDown` Button: Presses the `PageDown` button to scroll down the page. - Wait for Page Ready State: Waits for the comments to load with AgentQL's `wait_for_page_ready_state()` method. - Press `PageDown` Button: Presses the `PageDown` button to scroll down the page. - Wait for Page Ready State: Waits for the comments to load with AgentQL's `waitForPageReadyState()` method. ### Step 10: Capture and Log Comments ```python filename="example_script.py" response = page.query_data(COMMENT_QUERY) log.debug(f"Captured {len(response.get("comments"))} comments!") ``` ```js filename="example_script.js" const response = await page.queryData(COMMENT_QUERY); console.debug(`Captured ${response.get("comments").length} comments!`); ``` - Execute Query: Pass the session our `COMMENT_QUERY` to capture comment section data. - Count and Log: Here we simply log the number of comments captured. ### Step 11: Stop the Browser ```python filename="example_script.py" browser.close() ``` ```js filename="example_script.js" await browser.close(); ``` - Call `browser.close()` to releasing resources and close the browser. ### Step 12: Run the Script Open a terminal in your project's folder and run the script: ```bash python3 example_script.py ``` ```bash node example_script.js ``` ## Putting it all together… Here is the complete script, if you'd like to copy it directly: ```python filename="example_script.py" import logging import agentql from playwright.sync_api import sync_playwright logging.basicConfig(level=logging.DEBUG) log = logging.getLogger(__name__) URL = "https://www.youtube.com/" with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto(URL) SEARCH_QUERY = """ { search_input search_btn } """ VIDEO_QUERY = """ { videos[] { video_link video_title channel_name } } """ VIDEO_CONTROL_QUERY = """ { play_or_pause_btn expand_description_btn } """ DESCRIPTION_QUERY = """ { description_text } """ COMMENT_QUERY = """ { comments[] { channel_name comment_text } } """ try: # search query response = page.query_elements(SEARCH_QUERY) response.search_input.type("machine learning", delay=75) response.search_btn.click() # video query response = page.query_elements(VIDEO_QUERY) log.debug(f"Clicking Youtube Video: {response.videos[0].video_title.text_content()}") response.videos[0].video_link.click() # click the first youtube video # video control query response = page.query_elements(VIDEO_CONTROL_QUERY) response.expand_description_btn.click() # description query response_data = page.query_data(DESCRIPTION_QUERY) log.debug(f"Captured the following description: \n{response_data['description_text']}") # Scroll down the page to load more comments for _ in range(3): page.keyboard.press("PageDown") page.wait_for_page_ready_state() # comment query response = page.query_data(COMMENT_QUERY) log.debug(f"Captured {len(response.get("comments"))} comments!") except Exception as e: log.error(f"Found Error: {e}") raise e # Used only for demo purposes. It allows you to see the effect of the script. page.wait_for_timeout(10000) ``` ```js filename="example_script.js" const { chromium } = require('playwright'); const { wrap, configure } = require('agentql'); // Configure the AgentQL API key configure({ apiKey: process.env.AGENTQL_API_KEY }); const URL = 'https://www.youtube.com/'; // AgentQL queries const SEARCH_QUERY = ` { search_input search_btn } `; const VIDEO_QUERY = ` { videos[] { video_link video_title channel_name } } `; const VIDEO_CONTROL_QUERY = ` { play_or_pause_btn expand_description_btn } `; const DESCRIPTION_QUERY = ` { description_text } `; const COMMENT_QUERY = ` { comments[] { channel_name comment_text } } `; async function main() { const browser = await chromium.launch({ headless: false }); const context = await browser.newContext(); // Wrap the page to get access to the AgentQL's querying API const page = await wrap(await context.newPage()); await page.goto(URL); try { // Search query let response = await page.queryElements(SEARCH_QUERY); await response.search_input.type('machine learning', { delay: 75 }); await response.search_btn.click(); // Video query response = await page.queryElements(VIDEO_QUERY); const videoTitle = await response.videos[0].video_title.textContent(); console.log(`Clicking YouTube Video: ${videoTitle}`); await response.videos[0].video_link.click(); // Video control query response = await page.queryElements(VIDEO_CONTROL_QUERY); await response.expand_description_btn.click(); // Description query const responseData = await page.queryData(DESCRIPTION_QUERY); console.log(`Captured the following description:\n${responseData.description_text}`); // Scroll down the page to load more comments for (let i = 0; i # The AgentQL Query Language Source: https://docs.agentql.com/agentql-query AgentQL queries are a powerful way to extract data from web pages and automate workflows. They are designed to be self-healing, reusable, and structured format you define. ## Overview AgentQL queries are a powerful way to extract data from web pages. They are designed to be: - Self-healing—in the face of dynamic content and changing page structures, AgentQL still returns the same results - Reusable—the same query works for scraping across multiple similar pages - Structured format you define—shape your data with your query ## Guides ## Related content ## Query Introduction Source: https://docs.agentql.com/agentql-query/query-intro The AgentQL query serves as the building block of your script. This guide shows you how AgentQL's query structure works and how to write a valid query. ### Single term query A **single term query** enables you to retrieve a single element on the webpage. Here is an example of how you can write a single term query to retrieve a search box. ```AgentQL { search_box } ``` ### List term query A **list term query** enables you to retrieve a list of similar elements on the webpage. Here is an example of how you can write a list term query to retrieve a list of prices of apples. ```AgentQL { apple_price[] } ``` You can also specify the exact field you want to return in the list. Here is an example of how you can specify that you want the name and price from the list of products. ```AgentQL { products[] { name price(integer) } } ``` ### Combining single term queries and list term queries You can query for both **single terms** and **list terms** by combining the preceding formats. ```AgentQL { author date_of_birth book_titles[] } ``` ### Giving context to queries There two main ways you can provide additional context to your queries. #### Structural context You can nest queries within parent containers to indicate that your target web element is in a particular section of the webpage. ```AgentQL { footer { social_media_links[] } } ``` #### Semantic context You can also provide a short description within parentheses to guide AgentQL in locating the right element(s). ```AgentQL { footer { social_media_links(The icons that lead to Facebook, Snapchat, etc.)[] } } ``` ### Syntax guidelines Enclose all AgentQL query terms within curly braces `{}`. The following query structure isn't valid because the term "social_media_links" is wrongly enclosed within parenthesis`()`. ```AgentQL ( # Should be { social_media_links(The icons that lead to Facebook, Snapchat, etc.)[] ) # Should be } ``` You can't include new lines in your semantic context. The following query structure isn't valid because the semantic context isn't contained within one line. ```AgentQL { social_media_links(The icons that lead to Facebook, Snapchat, etc.)[] } ``` ## Pass context to queries with prompts Source: https://docs.agentql.com/agentql-query/pass-context When you need to improve the accuracy of your queries, query **contexts** allow you to provide AgentQL more precise instructions to improve your results. ## Overview This section shows you how to use semantic and structural contexts to focus your query. Semantic contexts leverage natural language to enhance your queries while structural contexts rely on data structures to enhance queries. ## Semantic contexts Semantic contexts allow you to use natural language to enhance your queries by wrapping them in parentheses `()` and appending them to the property. Here's an e-commerce example of a query enhanced with semantic contexts to shape the data with specific parameters: ```AgentQL { products(Exclude sponsored results or ads)[] { name description(Summarize within 150 words) price(Display in local currency with two decimal places) } } ``` The more descriptive you are with describing the context, the better AgentQL can follow your instructions. ### Adding conditions When you need AgentQL to filter out data based on certain conditionals, you can provide semantic context to the desired property. In the e-commerce example, the query uses conditions on `products` to exclude all sponsored results and ads. ```AgentQL {2} { products(Exclude sponsored results or ads)[] { name description price(integer) } } ``` Specifying conditions in your queries may produce inconsistent results because it relies on AI to understand your intent. For more precise and advanced filtering, extract the data first and then filter it using conventional programming methods. ### Requesting summaries AgentQL is also able to summarize text fields with specific parameters such as word length to transform long strings of text into more concise summaries. In the e-commerce products example, the `description` property utilizes the summaries context to ensure that all returned conditions stay within the 150 words parameter. ```AgentQL {4} { products[] { name description(Summarize within 150 words) price(integer) } } ``` ### Formatting data When you need to format data in a specific format, AgentQL also allows you to request specific formats with natural language. In the e-commerce example, AgentQL formats all pricing according to the local currency within two decimal places. For example: $29.99, €15.50, etc. ```AgentQL {5} { products[] { name description price(Display in local currency with two decimal places) } } ``` Formatting data with AgentQL queries may produce inconsistent results because it relies on AI to understand your intent. For precise formatting, extract the data first and then format it using conventional programming methods. ### Specify HTML properties You can also add context to select specific HTML properties. Ocassionally AgentQL may return the wrong element. In this case, you can add context to ensure the specific html properties you want. Here are some examples: ```AgentQL { products(must be a span tag)[] } ``` ```AgentQL { products(must be a span tag with class="product-name")[] } ``` ## Structual contexts Structural contexts utilize the query's data structure to provide context for the desired data. For example, you can structure the query to tell AgentQL the approximate location of the data you are looking for and its relation to other data on the page. ```AgentQL { footer { about_us contact_us social_media_links[] } } ``` In this example, the structure of the query tells the system to prioritize `about_us`, `contact_us`, and `social_media_links` in the footer of the page over similar elements on the page. Semantic context (#semantic-contexts) are generally recommended over structural contexts because it's easier to understand and more explicit. The structual context example can also be re-written with semantic context as follows: ```AgentQL { about_us(located in the footer) contact_us(located in the footer) social_media_links(footer links)[] } ``` ## Conclusion You can refine the elements and data AgentQL queries return by incorporating contextual information into your queries. This approach improves accuracy and provides flexibility in handling complex web structures and specific data requirements. As you become more familiar with contextual queries, you'll find them invaluable for efficiently tackling a wide range of web scraping challenges. ## Related Content ## Best Practices Source: https://docs.agentql.com/agentql-query/best-practices AgentQL queries allow users to retrieve the exact web page elements for interaction or data retrieval. Designed with flexibility in mind, AgentQL queries are schema-less, meaning query terms are free-form and not strongly typed. However, there are some syntax requirements as well as best practices for creating an AgentQL query. ## AgentQL query syntax The list below contains all syntax requirements for AgentQL Query: * Enclose query with curly braces. * Put new terms on new lines (i.e. one element per line). * Do **not** separate terms with punctuation. * A container should enclose its children terms with curly braces `{ ... }`. * Create a list term by adding closed brackets '[]' after the term. * Provide extra descriptions in parentheses '()' after the term. ### Query examples This is an example of a **Single Term Query** that tries to retrieve one element (search box) on the web page: ```AgentQL { search_box } ``` #### Query with extra contexts This is an example of providing extra contexts to query. A short description about the query term could be provided in parentheses to help AgentQL better locate the desired web elements: ```AgentQL { login_btn(the one in header section) footer { social_media_links(The icons that lead to Facebook, Snapchat, etc.)[] } } ``` #### Nested query This is an example of a nested query that tries to get a "sign in" button from a page's header and an "about" button from its footer. In this case, `` and `` elements serve as container elements that capture the hierarchical relationship of the desired elements. ```AgentQL { header { sign_in_btn } footer { about_btn } } ``` #### Query that returns a list This is an example of a **list term query**. This query tries to capture all the links on a web page. The AgentQL server returns an array of links for this query: ```AgentQL { links[] } ``` #### Nesting and lists combined This is another example of a **list term query**. However, the query is specifying the exact information wanted in every list item. In this case, AgentQL server returns an array. Each array item contains the price, the rating, and reviews of one product. The list element can be nested. The `reviews[]` element tries to capture all reviews for the parent product. ```AgentQL { products[] { price(integer) rating reviews[] } } ``` ## Recommended practices for creating queries AgentQL Query is designed to be flexible, but there are some recommended practices that may improve the response quality from AgentQL server: * Use lowercase letters for all the terms in query. * Use underscores (`_`) to separate words within a term (ie `user_image`). * Append `btn` to the term to indicate the element is a clickable (ie `search_btn`). * Append `box` to the name to indicate the element is inputtable (ie `search_box`). * Indent children terms in accordance with their parent's indentation. * Only use `query_elements` when you wish to interact with web elements. Use `query_data` to retrieve data. ## How to find the exact element on the page When there are multiple elements with the same or similar names on the web page, the AgentQL server may need further hints from your AgentQL query to find the exact element you are looking for. There are several things you can do with your query to help improve AgentQL's accuracy. ### Provide detailed descriptions Providing descriptions in parentheses is a powerful tool to get better query results. For instance, providing the description of "google sign-in button" to the `sign-in` term may help with targeting a specific button on a page. ```AgentQL { sign_in(google sign-in button) } ``` ### Include hierarchy hints Hierarchy hints reduce ambiguity. For example, if there are very similar buttons (f.i. “Sign In”) present on the web page, but one of them is positioned in the header and another one is in sign in form, you could try to specify such semantic information. Consider the following example web page and queries: !Linkedin WebPage and Queries (/images/docs/best-query-practice-p1.png) Different containers (`header` and `form`) will convey different hierarchical information to AgentQL server and locate different sign-in buttons on this page. ```AgentQL { header_button } ``` It can help to think of how you might increase a rule's specificity with a CSS selector. ### Use surrounding terms Another way to reduce ambiguity is to specify surrounding terms, so its more clear where the element is located. For example, if you are trying to locate “Sign in” button, which is placed between other 2 buttons, it may help to specify those other 2 buttons as well (even if you are not planning to interact with them) to clarify element location. ```AgentQL { button_a sign_in_btn button_b } ``` ## Examples Here shows some examples of working with real web pages. ### Retrieving Phone Model Button In this example, the Agent query retrieves two buttons for different models by specifying their containing element. ```AgentQL { model_selector { iphone_15_pro_max_btn iphone_15_pro_btn } } ``` !Apple Page Model Buttons (/images/docs/best-query-practice-p2.png) Alternatively, you could query the model selector itself. The button information will be preserved in the response, so you could retrieve the buttons through parsing. ```AgentQL { model_selector } ``` !Apple Page Model Selection Section (/images/docs/best-query-practice-p3.png) ### Retriving Amazon Product Information This AgentQL query gets the name and price of each product on the Amazon product listing page. ```AgentQL { results { products[] { product_name prouct_price(integer) } } } ``` !Amazon Product Page (/images/docs/best-query-practice-p5.png) If you want **all** the relevant information of each product, you could generalize the query above. ```AgentQL { results { products[] } } ``` !Amazon Product Page (/images/docs/best-query-practice-p6.png) ## Python SDK Source: https://docs.agentql.com/python-sdk AgentQL's Python SDK allows for automation as well as data extraction with a JavaScript integration with Playwright. ## Overview AgentQL's Python SDK allows for automation as well as data extraction with a Python integration with Playwright. ## Guides Check out the Release Notes (/release-notes) for updates and new features. ## Related content ## JavaScript SDK Source: https://docs.agentql.com/javascript-sdk AgentQL's JavaScript SDK allows for automation as well as data extraction with a JavaScript integration with Playwright. ## Overview AgentQL's JavaScript SDK allows for automation as well as data extraction with a JavaScript integration with Playwright. ## References Check out the Release Notes (/release-notes) for updates and new features. ## Related content ## REST API Source: https://docs.agentql.com/rest-api/api-reference AgentQL's REST API allows you to query web pages and documents like PDFs and image files to retrieve the results through HTTP requests from any language. ## Query data Queries structured data as JSON from a web page given a URL using either an AgentQL query (https://docs.agentql.com/agentql-query/query-intro). ```shell curl -X POST https://api.agentql.com/v1/query-data \ -H "X-API-Key: $AGENTQL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "{ products[] { product_name product_price(integer) } }", "url": "https://scrapeme.live/?s=fish&post_type=product", "params": { "wait_for": 0, "is_scroll_to_bottom_enabled": false, "mode": "fast", "is_screenshot_enabled": false } }' ``` ```python import requests url = "https://api.agentql.com/v1/query-data" headers = { "X-API-Key": "$AGENTQL_API_KEY", "Content-Type": "application/json" } payload = { "query": "{ products[] { product_name product_price(integer) } }", "url": "https://scrapeme.live/?s=fish&post_type=product", "params": { "wait_for": 0, "is_scroll_to_bottom_enabled": False, "mode": "fast", "is_screenshot_enabled": False } } response = requests.post(url, headers=headers, json=payload) data = response.json() ``` ```javascript const response = await fetch('https://api.agentql.com/v1/query-data', { method: 'POST', headers: { 'X-API-Key': '$AGENTQL_API_KEY', 'Content-Type': 'application/json' }, body: JSON.stringify({ query: "{ products[] { product_name product_price(integer) } }", url: 'https://scrapeme.live/?s=fish&post_type=product', params: { wait_for: 0, is_scroll_to_bottom_enabled: false, mode: "fast", is_screenshot_enabled: false } }) }); const data = await response.json(); ``` Make sure to replace `$AGENTQL_API_KEY` with your actual API key. ```json filename="Response" { "data": { "products": [ { "product_name": "Qwilfish", "price": 77 }, { "product_name": "Huntail", "price": 52 }, ... ] }, "metadata": { "request_id": "ecab9d2c-0212-4b70-a5bc-0c821fb30ae3" } } ``` ### Authentication All requests to the AgentQL API must include an `X-API-Key` header with your API key. You can generate an API key through Dev Portal. ### Request body for web queries - `query` string (alternative to `prompt`) The AgentQL query to execute. Learn more about how to write an AgentQL query in the docs (https://docs.agentql.com/agentql-query). **Note: You must define either a `query` or a `prompt` to use AgentQL.** - `prompt` string (alternative to `query`) A Natural Language description of the data to query the page for. AgentQL infers the data structure from your prompt. **Note: You must define either a `query` or a `prompt` to use AgentQL.** - `url` string (alternative to `html`) The URL of the public web page you want to query. **Note: You must define either a `url` or `html` to use AgentQL.** - `html` string (alternative to `url`) The raw HTML to query data from. Useful if you have a private or locally generated copy of a web page. **Note: You must define either a `url` or `html` to use AgentQL.** - `params` object (optional) - `wait_for` number The number of seconds to wait for the page to load before querying. **Defaults to `0`.** - `is_scroll_to_bottom_enabled` boolean Whether to scroll to bottom of the page before querying. **Defaults to `false`.** - `mode` str `standard` uses deep data analysis, while `fast` trades some depth of analysis for speed and is adequate for most usecases. Learn more about the modes in this guide. (https://docs.agentql.com/accuracy/standard-mode) **Defaults to `fast`.** - `is_screenshot_enabled` boolean Whether to take a screenshot before extracting data. Returned in `metadata` as a Base64 string. **Defaults to `false`.** ### Response for web queries - `data` object Data that matches the query. - `metadata` object - `request_id` string A Universally Unique Identifier (UUID) for the request. - `screenshot` string \| null Base64 encoded screenshot if enabled, `null` otherwise. You can convert the Base64 string returned in the `screenshot` field to an image and view it using free online tools like Base64.guru. ## Query document Extract data from a webpage by sending a PDF or image (JPEG, JPG, PNG) file and an AgentQL query (/agentql-query/query-intro). Learn about the consumption logic for querying documents here (/query-document-pricing) for this example, use the following example file !Bidding Document (/images/docs/query-doc-example.png) The `query_document` function consumes 1 API call per image (JPG, JPEG, JPG), and 1 API call for each page within a PDF. (i.e querying a 10-page PDF will take 10 AgentQL API calls) ```shell curl -X POST https://api.agentql.com/v1/query-document \ -H "X-API-Key: $AGENTQL_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@/path/to/file.pdf" \ -F 'body="{\"query\": \"{ project { id lowest_bidder lowest_bid } } \", \"params\": { \"mode\": \"fast\" } }" ' ``` ```python import json import requests url = "https://api.agentql.com/v1/query-document" headers = { "X-API-Key": "$AGENTQL_API_KEY", } form_body = { "body": json.dumps({ "query": " { project { id lowest_bidder lowest_bid } }", "params": { "mode": "fast"} }) } with open("@/path/to/file", "rb") as f: file_object - {"file": ("file_name", f.read())} response = requests.post(url, headers=headers, files=file_object, data=form_body) data = response.json() ``` ```javascript // This example uses Node.js's fs module to read a file and FormData to send it to the server. // Unlike other examples in our documentation, this code is wrapped in an async function and requires module imports. var fs = require('fs'); async function main() { const file = fs.readFileSync('@path/to/file'); form_data = new FormData(); form_data.append('file', new Blob([file], { type: 'application/pdf' })); //or 'image/png' form_data.append('body', JSON.stringify({ query: ' { project { id lowest_bidder lowest_bid } } ' })); form_data.append('params', JSON.stringify({ mode: 'fast' })); const response = await fetch('https://api.agentql.com/v1/query-document', { method: 'POST', headers: { 'X-API-Key': $AGENTQL_API_KEY, }, body: form_data }); const data = await response.json(); } ``` Make sure to replace `$AGENTQL_API_KEY` with your actual API key. ```json filename="Response" { "data": { "project": { "id": "CPM 81031-200202", "lowest_bidder": "Toebe Construction LLC", "lowest_bid": 13309641.63 } }, "metadata": { "request_id": "ecab9d2c-0212-4b70-a5bc-0c821fb30ae3" } } ``` ### Authentication All requests to the AgentQL API must include an `X-API-Key` header with your API key. You can generate an API key through Dev Portal. ### Request body for document queries The request body for querying documents is a multipart/form-data object that contains a file and a body. - `file` string File Path of file to execute query on. - `body` string The body is a stringified JSON that represents parameters for the query because multipart/form-data only takes string. 1. `query` string (alternative to `prompt`): The AgentQL query to execute. Learn more about how to write an AgentQL query in the docs (https://docs.agentql.com/agentql-query). **Note: You must define either a `query` or a `prompt` to use AgentQL.** 2. `prompt` string (alternative to `query`): A Natural Language description of the data to query the page for. AgentQL infers the data structure from your prompt. **Note: You must define either a `query` or a `prompt` to use AgentQL.** 3. `params` object (optional): representation of the parameters for the query. - `mode` str: Specifies the extraction mode: `standard` for complex or high-volume data, or `fast` for typical use cases. Defaults to `fast`. ### Response for document queries - `data` object Data that matches the query - `metadata` object - `request_id` string A UUID for the request The `query_document` is supported in Python SDK. Learn how to use it here (https://docs.agentql.com/python-sdk/api-references/agentql-tools#query-document) ## Release Notes Source: https://docs.agentql.com/release-notes ## Version 1.9.2 - Added better error messages when API key is invalid ## Version 1.9.1 - Fixed a bug where accessibility tree generation fails if there are hidden text elements ## Version 1.9.0 ### New features - (Python SDK) Added an `experimental_query_elements_enabled` argument to `query_elements()` and `get_by_prompt()` to improve accuracy. ### Fixes - (Python SDK) Fixed a bug where iframe accessibility tree could be `None` on some websites. ## Version 1.8.1 ### Fixes - Fixed a bug where text elements may lose context during page processing ## Version 1.8.0 ### New features - Debug information available on `Page` object. Users can now access the last query, response, and accessibility tree generated by the AgentQL SDK on this page using `getLastQuery()`, `getLastResponse()`, and `getLastAccessibilityTree()` methods respectively. Users can now access the last query, response, and accessibility tree generated by the AgentQL SDK on this page using `get_last_query()`, `get_last_response()`, and `get_last_accessibility_tree()` methods respectively. These information may be useful for debugging and trouble-shooting. ### Fixes - (Python SDK) Fixed a bug when trying to wrap an already wrapped Playwright `Page` ## Version 1.7.1 ### Fixes - Updated endpoint for AgentQL query generation in Python SDK. ## Version 1.7.0 ### New features for Python SDK - Pagination! AgentQL Python SDK now supports pagination under `agentql.tools` module. With `paginate()`, users can automatically collect data from multiple pages using an AgentQL query. Additionally, user can use `get_pagination_info()` to step through the pagination process for further manipulation. For more information, please refer to the API references for `paginate()` (/python-sdk/api-references/agentql-tools#paginate) and `get_pagination_info()` (/python-sdk/api-references/agentql-page#getpaginationinfo). Tiny Fish is planning to add pagination support to JavaScript SDK soon. ## Version 1.6.2 ### Improvements - Optimized accessibility tree generation by combining processing steps. ### Fixes - Fixed a bug in accessibility tree generation caused by undefined element tag name. ## Version 1.6.1 ### Fixes - Fixed a bug in accessibility tree generation affecting specific websites. ## Version 1.6.0 ### Python SDK #### Breaking changes - `DebugManager` and `TrailLogger` are removed from the Python SDK. Now, to debug your scripts, you can set the logging level to `DEBUG` in your script like this: ```python import logging logging.basicConfig(level=logging.DEBUG) ``` #### Fixes - Fixed the issue where AgentQL hangs when the page crashes. #### Improvements - Stealth mode library updated to version 1.1.0. Now users can pass `browser_type` parameter to indicate the browser type they are using. ``` await page.enable_stealth_mode(nav_user_agent=user_agent, browser_type="chrome") ``` ### JavaScript SDK #### Fixes - Fixed the issue where AgentQL hangs when the page crashes. - Fixed the issue where AgentQL throw `Unexpected number` error when generating accessibility tree. ## Version 1.5.0 ### JavaScript SDK #### Breaking Changes - We have updated the following methods to accept an **options object** for optional parameters instead of using positional arguments: - `getByPrompt(prompt, options)` - `queryElements(query, options)` - `queryData(query, options)` - `waitForPageReadyState(options)` For more information, please refer to the API reference (/javascript-sdk/api-references/agentql-page). ## Version 1.4.1 #### New Features - JavaScript SDK! AgentQL now supports JavaScript SDK! Check out the installation instructions (/javascript-sdk/installation) and our launch week announcement post (https://agentql.com/blog/javascript-sdk) to learn more or our new JavaScript examples to get started (https://github.com/tinyfish-io/agentql/). #### Improvements - default `query_elements()` timeout increased to 300 seconds - default `get_data_by_prompt_experimental()` timeout increased to 75 seconds ## Version 1.4.0 ### Breaking Changes - "fast" mode is now the default mode for `query_elements()`, `query_data()`, and `get_by_prompt()` methods. Users can still use "standard" mode by setting the `mode` parameter to "standard": ```python response = page.query_data(QUERY, mode="standard") ``` ### Fixes - Fixed the issue where page monitor is not initialized properly when `page.goto()` is not called. ### Improvements - Added support for non-ASCII characters in query descriptions. ## Version 1.3.0 ### Breaking Changes - `include_aria_hidden` parameter For `query_elements()`, `query_data()` and `get_by_prompt()` methods, the parameter `include_aria_hidden` was changed to `include_hidden` parameter so that users can control whether to include hidden elements when trying to fetch elements or data. ## Version 1.2.0 ### New Features - Commas are now supported in AgentQL queries. Users can now use commas to separate query terms in the query string. For example, the following query is now valid: ```AgentQL { first_name, last_name, email } ``` ### Improvements - Improved the reliability of `wait_for_page_ready_state()` method by more thoroughly capturing page events. ### Fixes - Fixed `DebugManager` not finalizing the logger and returning all desired logs. ## Version 1.1.0 ### Breaking Changes - Session-based API is removed from `agentql` package. For new Page-based API, users can refer to this guide (/getting-started/first-steps). ### New Features - Fast Mode `AgentQL` now supports **Fast Mode** for `query_elements()` (/python-sdk/api-references/agentql-page#query_elements), `query_data()` (/python-sdk/api-references/agentql-page#query_data), and `get_by_prompt()` (/python-sdk/api-references/agentql-page#get_by_prompt) methods. Users can specify the mode they would like to use with the `mode` parameter -- `fast` mode will decrease the response time but may lower the accuracy of the response. For API reference, visit this page (/python-sdk/api-references/agentql-page). - `agentql new-script` command Users can now use `agentql new-script` command to quickly set up a template script. Currently, users could choose between `sync` and `async` scripts. For API reference, visit this page (/cli-reference). - `Request ID` for trouble-shooting If there is a server-side error, `AgentQL` now returns a `Request ID` that corresponds to a specific request in `AgentQL` backend server. This ID will be output to the console at the end of error messages. Including this ID when reaching out for support will greatly increase the speed of assistance. ### Improvements - Improved accessibility tree generation by including child nodes of slot elements in the tree. ## Version 1.0.1 ### Fixes - Fixed invalid documentation links in error messages. ## Version 1.0.0 `AgentQL` is officially launched with a new API! ### Breaking Changes - Session-based API is deprecated. They will be removed in version `1.1.0`. For new Page-based API, users can refer to this guide (/getting-started/first-steps). ### New Features - `wrap()` and `wrap_async()` The `agentql` module provides the above two utility methods to convert Playwright's `Page` (https://playwright.dev/python/docs/api/class-page) to AgentQL's `Page` (/python-sdk/api-references/agentql-page), which gives access to AgentQL's querying API. For instructions on how to use them, visit this API reference page (/python-sdk/api-references/agentql). - `get_by_prompt()` Other than `query_elements()` (/python-sdk/api-references/agentql-page#query_elements) and `query_data()` (/python-sdk/api-references/agentql-page#query_data) methods, `AgentQL` now provides `get_by_prompt()` for users to fetch a single element from web page using natural language. For API reference, visit this page (/python-sdk/api-references/agentql-page#get_by_prompt). ## Version 0.5.3 ### Fixes - Fixed accessibility tree generation creating duplicate IDs for web elements. - Optimized accessibility tree generation by including nodes with `aria-hidden=true` attribute by default. - Adjusted how API key is checked so that keys set through environment variable will take precedence over those set in config file. ### Improvements - Debug mode now generates `request_id` information for each `query` request. Users can share this information with Tiny Fish developers when asking for help with a specific query. ## Version 0.5.2 ### New Features - Query Data Previously, AgentQL adhered to a one-to-one relationship between query terms and web elements, which sometimes made it difficult to query a block of text or retrieve actual text value from responses. Now, users can achieve these tasks with the newly added `session.query_data()` method. The following example demonstrates how to use this endpoint: ```python filename="agentql_example.py" import agentql session = agentql.start_session("https://apply.workable.com/pony-dot-ai/j/56A463E1D3/") QUERY = """ { required_programming_skills (just the skill name)[] base_salary_min (without the dollar sign, use _ as separator) base_salary_max (with dollar sign) } """ response = session.query_data(QUERY) # Text of the query terms could be directly retrieved in the following way print(f"Base salary min: {response.base_salary_min}") print(f"Base salary max: {response.base_salary_max}") for skill in response.required_programming_skills: print(f"Required programming skill: {skill}") session.stop() ``` ### Fixes - Improved accessibility tree generation logic by removing HTML elements with the `code` tag. ## Version 0.5.1 ### Fixes - Fixed the script getting stuck in an infinite loop when the starting character and the ending character of the query are the same, but they are not quotation marks ## Version 0.5.0 ### Breaking changes - Modules structure update. Playwright web drivers are now located in `agentql.ext.playwright.sync_api` and `agentql.ext.playwright.async_api` for synchronous and asynchronous versions respectively: ```python filename="agentql_example.py" from agentql.ext.playwright.sync_api import PlaywrightWebDriver from agentql.ext.playwright.async_api import PlaywrightWebDriver ``` ### New features - Debug Mode Users can now use AgentQL SDK's Debug Mode to debug their scripts. The following example demonstrates how to enable this mode: ```python filename="agentql_example.py" from agentql.sync_api import DebugManager with DebugManager.debug_mode(): your_script ``` ```python filename="agentql_example.py" from agentql.async_api import DebugManager async with DebugManager.debug_mode(): your_script ``` It will save meta information (like OS, Python version, AgentQL version), logs, error information, last accessibility tree used, and screenshots of every page queried to the debug folder. The default path is `$HOME/.agentql/debug`. - Query Terms' Context Previously, when describing a term in the query, users would need to do something like this: ```AgentQL { second_button_from_the_top_next_to_login_button_only_if_hero_image_is_present } ``` Now, AgentQL Query supports providing context for the query terms. Add it inside parentheses like this: ```AgentQL { button(This is the second button from the top. It's next to login button and will only appear when hero image is present) } ``` For more details, please check out our query introduction page (/agentql-query/query-intro). - Search in Documentation Website Users can now search for keywords in AgentQL Documentation Website. ### Improvements - Supported iterating over query's collection data items via `for` loop - Improved typechecking for AgentQL response - Added AgentQL config file path to the `agentql init` command's output ### Fixes - Fixed a crash in `wait_for_page_ready_state()` method when it was invoked before page redirection - Fixed `session.current_page` not updating when opening new tab after clicking on a link ## Version 0.4.7 ### Fixes: - Refactored part of the internal logic of query syntax ## Version 0.4.6 ### Improvements: - Improved the error message for a better debugging experience - Allowed History log to output logging information even when an error is raised ### Fixes: - Fixed scrolling not working on some websites - Fixed accessibility tree not being captured correctly on some websites ## Version 0.4.5 ### Improvements: - Added **AgentQL CLI** which is a tool designed to assist you in using the AgentQL SDK. It can help you set up your development environment. - Added **Trail Logger** which can log actions taken by AgentQL SDK and display them at the end of a session. This can be used for debugging your scripts. The Trail Logger can be enabled through `enable_history_log` parameter in `start_session()` method and the logs can be obtained through `session.get_last_trail()`. - Added `Session#last_accessibility_tree` property to get the last captured accessibility tree. It can be helpful for debugging purposes. - Added `Popup#page_url` property to get the URL of the page where the popup occurred. It can be used when analyzing popup on different pages. - Adjusted the error message for `AttributeNotFoundError` for better debugging information. - Moved the import path for `ProxySettings`, `Locator` and `Page` class to `agentql.ext.playwright`. ### Fixes: - Fixed some web pages with empty iframes HTML element crashing the accessibility tree generation logic. - Fixed `wait_for_page_ready_state() ` not reliably waiting on some websites. ## Version 0.4.4 ### Fixes: - Addressed incorrect hidden elements detection logic ## Version 0.4.3 ### Fixes: - Fixed some web page elements being incorrectly marked as "hidden" and not included in the query result. ## Version 0.4.2 ### Fixes: - Fixed the page not being closed when the session was closed. ## Version 0.4.1 ### Fixes: - Fixed SDK crash to enable async SDK usage and multiple sync sessions. - Fixed a potential resource leak issue during session creation failures. ## Version 0.4.0 ### Breaking changes - Major modules structure overhaul. - Playwright Web Driver now starts in "headed" mode by default. To start it in "headless" mode, users need to pass `headless=True` to the `PlaywrightWebDriver` constructor. ## Version 0.3.1 ### Fixes: - Fixed SDK crash on Python versions We've migrated our SDK from webql to agentql to be consistent with our new branding! This release introduces breaking changes. Please refer to "Breaking Changes" section for latest information. ### Breaking changes - As we have moved our SDK from webql to agentql, our Python library is now called `agentql` and you can import the same with `import agentql` - API key setup, instead of `WEBQL_API_KEY`, now the users need to set `AGENTQL_API_KEY`. We have also updated our docs to reflect those changes! The underlying APIs available and the way they can be leveraged are still the same. ## Version 0.2.8 ### Hotfix release - Fixed `TypeError: AsyncClient.post() got an unexpected keyword argument 'allow_redirects'` ## Version 0.2.7 This release introduces some breaking changes. Please refer to "Breaking Changes" section for latest information. ### Breaking changes As we continue drawing a clearer line between Session and WebDriver, we removed several APIs which were previously present in `Session` class: ```python filename="agentql_example.py" # Removed APIs session.scroll_up() session.scroll_down() session.scroll_to_bottom() session.load_user_session_state() session.wait_for_page_ready_state() session.get_user_session_state() session.save_user_session_state() ``` All these methods are now available in `WebDriver` class, so you can use them in the following way: ```python filename="agentql_example.py" session.driver.scroll_up() session.driver.scroll_down() session.driver.scroll_to_bottom() session.driver.load_user_session_state() session.driver.wait_for_page_ready_state() session.driver.get_user_session_state() session.driver.save_user_session_state() ``` ### Improvements - Fixed possible crash in PlaywrightDriver related to unbound variable (#252) - Allowed http redirects for AgentQL API calls (#257) - Fixed resource leak: reuse existing browser context for iframes (#259) - Fixed resource leak: dom update listener is never removed (#258) - Moved to tf-playwright-stealth (#260) - Relaxed dependency requirements (#261) - Added environment variable to control API host (#262) ## Version 0.2.6 ### Improvements - Optimized the code by making `enable_stealth_mode()` method sync in Asynchronous version of SDK. ## Version 0.2.5 ### Highlights This release introduces public APIs for checking whether web driver is in `headless` mode and for retrieving `web driver` instance in `Session` class. Several bug fixes and code optimization are also included in this release. ### New Features - API to retrieve `web driver` instance from `Session` Users can now interact with the web driver instance directly from `Session` class in the following way: ```python filename="agentql_example.py" # This will scroll to the bottom of the page session.driver.scroll_to_bottom() # This will wait for page to enter a stable state session.driver.wait_for_page_ready_state() ``` - API to retrieve `headless` setting Users can now determine whether the browser is started in `headless` mode by invoking `session.driver.is_headless()`. ### Bug Fixes - Fixed a bug where users can not chain methods for response object. ## Version 0.2.4 ### Highlights This release introduces `Stealth Mode` to SDK. Stealth mode will **decrease users' possibility of being marked as bot** on some websites. ### New Features - Stealth Mode Users can enable stealth mode by invoking `enable_stealth_mode()` method in `Web Driver` class. Users can pass in their `User Agent`, `webgl renderer`, and `webgl vendor` information to maximize the effect of stealth mode. Users can activate the `Stealth Mode` like this: ```python filename="agentql_example.py" import webql as wql from webql.sync_api.web import PlaywrightWebDriver driver = PlaywrightWebDriver(headless=False) # Enable the stealth mode and set the stealth mode configuration driver.enable_stealth_mode( webgl_vendor=VENDOR_INFO, webgl_renderer=RENDERER_INFO, nav_user_agent=USER_AGENT_INFO, ) ``` ## Version 0.2.3 ### Highlights This release improves the stability and reliability of SDK by introducing fixes to some known bugs. ### Bug Fixes - Fixed a bug where page interaction sometimes froze in headless mode. - Fixed a bug for data postprocessing in async environment. ## Version 0.2.2 ### Highlights This release introduces a new API through which users can retrieve `Page` object from web driver. In addition, this release also includes several bug fixes and code optimization. ### New Features - New public API for getting `Page` object from web driver A public API has been added to `Session` class for retrieving `Page` object. With the `Page` object, users can interact with web pages more freely, such as page refreshing and navigation. For instance, to refresh the page, users can use the following script: ```python filename="agentql_example.py" session = webql.start_session() # This will reload the current web page session.current_page.reload() ``` To navigate to a new website, users can use the following script: ```python filename="agentql_example.py" session = webql.start_session() # This will take the page to a new website session.current_page.goto("new website link") ``` ### Bug Fixes - Fixed a bug where None value in response data is not handled properly. - Fixed a bug where to_data() method is not working properly in the asynchronous environment. ## Version 0.2.1 ### Highlights This release introduces a new feature where users can retrieve and load browser's authentication session to maintain login state. ### New Features - Get & Set User Authentication Session: With this release, users can maintain the previous login state by initializing a session with the user authentication state. To retrieve the authentication state from the current session, users can utilize `Session` class's `get_user_auth_state()`: ```python filename="agentql_example.py" # Prior to this point, the script has already signed into a website # This will retrieve the auth state for current session user_auth_state = session.get_user_auth_state() # The session info can be saved to local file system like this with open(FILE_PATH, "w") as f: f.write(json.dumps(user_auth_state)) ``` To load the authentication state while initializing the session, users can pass `user_auth_state` into `start_session()`'s `user_auth_session` parameter: ```python filename="agentql_example.py" user_auth_session = None # To load user_auth_session from local file, users can do something like this with open(FILE_PATH, "r") as f: user_auth_session = json.loads(f.read()) session = webql.start_session(user_auth_session=user_auth_session) ``` For a more detailed instruction on how to retrieve and load user session, please refer to the following example (https://github.com/tinyfish-io/agentql/blob/main/examples/save_and_load_context/save_and_load_context.py) in our example repository. ## Version 0.2.0 This release introduces some breaking changes. Please refer to "Breaking Changes" section for latest information. ### Highlights This release introduces the asynchronous version of the package. Now users can utilize AgentQL in an optimized fashion within their asynchronous environment. ### New Features - Asynchronous Support: With this release, users can start an asynchronous session using the following script: ```python filename="agentql_example.py" import webql async_session = await webql.start_async_session() ``` For a more detailed instruction on how to use async version, please refer to the following example (https://github.com/tinyfish-io/agentql/blob/main/examples/async_example/async_example.py) in our example repository. ### Breaking Changes We have introduced some changes to our public API structure. Specifically, **users need to choose between synchronous API and asynchronous API when importing web drivers and helper methods**. Now, `PlaywrightWebDriver` and `close_all_popups_handler` need to be imported in the following fashion: - Synchronous ```python filename="agentql_example.py" from webql.sync_api.web import PlaywrightWebDriver from webql.sync_api import close_all_popups_handler ``` - Asynchronous ```python filename="agentql_example.py" from webql.async_api.web import PlaywrightWebDriver from webql.async_api import close_all_popups_handler ``` The following way of importing `PlaywrightWebDriver` and `close_all_popups_handler` is no longer supported. The following script is deprecated and no longer supported. ```python filename="agentql_example.py" from webql.web import PlaywrightWebDriver from webql import close_all_popups_handler ``` ### Python SDK Source: https://docs.agentql.com/python-sdk AgentQL's Python SDK allows for automation as well as data extraction with a JavaScript integration with Playwright. ## Overview AgentQL's Python SDK allows for automation as well as data extraction with a Python integration with Playwright. ## Guides Check out the Release Notes (/release-notes) for updates and new features. ## Related content #### Installation Source: https://docs.agentql.com/python-sdk/installation ## Prerequisites - Python 3.8 or higher You may want to use a virtual environment, but it's not required. ## Installation options There are two ways to install AgentQL SDK: * AgentQL CLI installation (#option-1-agentql-cli-installation) — Get set up fast by installing the AgentQL library and then using AgentQL CLI to download dependencies and setup AgentQL API Key. * Manual installation (#option-2-manual-installation) — For a more customized setup, manually install the AgentQL SDK. ### Option 1: AgentQL CLI Installation #### 1. Install AgentQL library From your terminal, run the following command to install the AgentQL library: ```bash pip3 install agentql ``` #### 2. Install dependencies and set API key The following AgentQL CLI command will prompt you for your API key. When prompted, simply copy and paste your AgentQL API key into the terminal. ```bash agentql init ``` ### Option 2: Manual Installation From your terminal, run the following command to install the AgentQL library: #### 1. Install AgentQL library ```bash pip3 install agentql ``` #### 2. Install playwright driver The default version of AgentQL Python SDK uses Playwright as a web driver, so Playwright dependencies need to be installed. ```bash playwright install chromium ``` #### 3. Set your AgentQL API Key Set the `AGENTQL_API_KEY` environment variable with your API key (https://dev.agentql.com/). To set the environment variable temporarily for your terminal session, in your terminal run ```bash export AGENTQL_API_KEY=your-api-key ``` #### Powershell If you are using **Powershell** as your terminal, you can set the environment variable with the following command ```bash $env:AGENTQL_API_KEY="your-api-key" ``` #### Command Prompt If you are using **Command Prompt** as your terminal, you can set the environment variable with the following command ```bash set AGENTQL_API_KEY=your-api-key ``` ## Run Your First AgentQL Script Now you are ready to run your first AgentQL script! Continue to First Steps (/getting-started/first-steps) to get started. #### API Reference Source: https://docs.agentql.com/python-sdk/api-references AgentQL's Python SDK API references for data extraction and web automation. The AgentQL Python SDK ships with a range of modules and classes to help you automate interactions with and parse, extract, and scrape data from web pages quickly and at scale. * `agentql` module (/python-sdk/api-references/agentql) provides utility methods to convert Playwright's `Page` to AgentQL's `Page` (/python-sdk/api-references/agentql-page), which gives access to AgentQL's querying API. * AgentQL `Page` (/python-sdk/api-references/agentql-page) is a wrapper around Playwright's `Page` that provides access to AgentQL's querying API. * The AgentQL's `Page`'s `query_elements()` method (/python-sdk/api-references/agentql-page#queryelements) will return a `AQLResponseProxy` (/python-sdk/api-references/aqlresponse). It's not the actual data but a metadata structure that allows intuitive access to web elements using dot notation. * `agentql.tools` module (/python-sdk/api-references/agentql-tools) provides utility methods to help with data extraction and web automation. * `PaginationInfo` (/python-sdk/api-references/paginationinfo) is a class that provides information about pagination and allows for navigation to the next page. Check out the Release Notes (/release-notes) for updates and new features. ## Related content ##### API Reference Source: https://docs.agentql.com/python-sdk/api-references AgentQL's Python SDK API references for data extraction and web automation. The AgentQL Python SDK ships with a range of modules and classes to help you automate interactions with and parse, extract, and scrape data from web pages quickly and at scale. * `agentql` module (/python-sdk/api-references/agentql) provides utility methods to convert Playwright's `Page` to AgentQL's `Page` (/python-sdk/api-references/agentql-page), which gives access to AgentQL's querying API. * AgentQL `Page` (/python-sdk/api-references/agentql-page) is a wrapper around Playwright's `Page` that provides access to AgentQL's querying API. * The AgentQL's `Page`'s `query_elements()` method (/python-sdk/api-references/agentql-page#queryelements) will return a `AQLResponseProxy` (/python-sdk/api-references/aqlresponse). It's not the actual data but a metadata structure that allows intuitive access to web elements using dot notation. * `agentql.tools` module (/python-sdk/api-references/agentql-tools) provides utility methods to help with data extraction and web automation. * `PaginationInfo` (/python-sdk/api-references/paginationinfo) is a class that provides information about pagination and allows for navigation to the next page. Check out the Release Notes (/release-notes) for updates and new features. ## Related content ###### `agentql` module Source: https://docs.agentql.com/python-sdk/api-references/agentql The `agentql` module provides utility methods to convert Playwright's `Page` (https://playwright.dev/python/docs/api/class-page) to AgentQL's `Page` (/python-sdk/api-references/agentql-page), which gives access to AgentQL's querying API. The following example creates a page, navigates it to a URL, and queries for web elements: ```python filename="agentql_example.py" import agentql # [!code highlight] from playwright.sync_api import sync_playwright QUERY = """ { search_box search_btn } """ with sync_playwright() as p, p.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) # Wraps the Playwright Page to access AgentQL's features. # [!code highlight] page.goto("https://duckduckgo.com") aql_response = page.query_elements(QUERY) aql_response.search_box.type("AgentQL") aql_response.search_btn.click() # Used only for demo purposes. It allows you to see the effect of the script. page.wait_for_timeout(10000) ``` ```python filename="agentql_example.py" import asyncio import agentql # [!code highlight] from playwright.async_api import async_playwright QUERY = """ { search_box search_btn } """ async def main(): async with async_playwright() as p, await p.chromium.launch(headless=False) as browser: page = await agentql.wrap_async(browser.new_page()) # Wraps the Playwright Page to access AgentQL's features. # [!code highlight] await page.goto("https://duckduckgo.com") aql_response = await page.query_elements(QUERY) await aql_response.search_box.type("AgentQL") await aql_response.search_btn.click() # Used only for demo purposes. It allows you to see the effect of the script. await page.wait_for_timeout(10000) asyncio.run(main()) ``` --- ## Methods ### wrap Casts a Playwright Sync `Page` object to an AgentQL `Page` type to get access to AgentQL's querying API. See AgentQL `Page` (agentql-page) reference for API details. #### Usage ```python filename="agentql_example.py" page = agentql.wrap(browser.new_page()) ``` #### Arguments - `page` Playwright's Page (https://playwright.dev/python/docs/api/class-page) #### Returns - AgentQL Page (agentql-page) --- ### wrap_async Casts a Playwright Async `Page` object to an AgentQL `Page` type to get access to the AgentQL's querying API. See AgentQL `Page` (agentql-page) reference for API details. #### Usage ```python filename="agentql_example.py" page = agentql.wrap_async(browser.new_page()) ``` #### Arguments - `page` Playwright's Page (https://playwright.dev/python/docs/api/class-page) #### Returns - AgentQL Page (agentql-page) ###### AgentQL `Page` Source: https://docs.agentql.com/python-sdk/api-references/agentql-page AgentQL `Page` is a wrapper around Playwright's `Page` (https://playwright.dev/python/docs/api/class-page) that provides access to AgentQL's querying API. The following example creates a Playwright's page, navigates it to a URL, and queries for WebElements using AgentQL: ```python filename="agentql_example.py" import agentql from playwright.sync_api import sync_playwright QUERY = """ { search_box search_btn } """ with sync_playwright() as p, p.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) # Wrapped to access AgentQL's query API's page.goto("https://duckduckgo.com") # [!code highlight] aql_response = page.query_elements(QUERY) # [!code highlight] aql_response.search_box.type("AgentQL") aql_response.search_btn.click() # Used only for demo purposes. It allows you to see the effect of the script. page.wait_for_timeout(10000) ``` ```python filename="agentql_example.py" import asyncio import agentql from playwright.async_api import async_playwright QUERY = """ { search_box search_btn } """ async def main(): async with async_playwright() as p, await p.chromium.launch(headless=False) as browser: page = await agentql.wrap_async(browser.new_page()) # Wrapped to access AgentQL's query API's await page.goto("https://duckduckgo.com") # [!code highlight] aql_response = await page.query_elements(QUERY) # [!code highlight] await aql_response.search_box.type("AgentQL") await aql_response.search_btn.click() # Used only for demo purposes. It allows you to see the effect of the script. await page.wait_for_timeout(10000) asyncio.run(main()) ``` --- ## Methods ### get_by_prompt Returns a single web element located by a natural language prompt (as opposed to a AgentQL query). #### Usage ```python filename="agentql_example.py" search_box = page.get_by_prompt(prompt="Search input field") ``` ```python filename="agentql_example.py" search_box = await page.get_by_prompt(prompt="Search input field") ``` #### Arguments - `prompt` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) The natural language description of the element to locate. - `timeout` int (https://docs.python.org/3/library/stdtypes.html#index-13) (optional) Timeout value in seconds for the connection with backend API service. - `wait_for_network_idle` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to wait for network reaching full idle state before querying the page. If set to `False`, this method will only check for whether page has emitted `load` event (https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event). Default is `True`. - `include_hidden` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to include hidden elements on the page. Defaults to `False`. - `mode` ResponseMode (#responsemode) (optional): The mode of the query. It can be either `standard` or `fast`. Defaults to `fast` mode. - `experimental_query_elements_enabled` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to use the experimental implementation of the query elements feature. Defaults to `False`. #### Returns - Locator (https://playwright.dev/python/docs/api/class-locator) | `None` (https://docs.python.org/3/library/constants.html#None) Playwright Locator for the found element or `None` if no matching elements were found. --- ### query_elements Queries the web page for multiple web elements that match the AgentQL query. #### Usage ```python filename="agentql_example.py" agentql_response = page.query_elements( query=""" { search_box search_btn } """ ) print(agentql_response.to_data()) ``` ```python filename="agentql_example.py" agentql_response = await page.query_elements( query=""" { search_box search_btn } """ ) print(await agentql_response.to_data()) ``` #### Arguments - `query` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) An AgentQL query (/agentql-query/query-intro) in String format. - `timeout` int (https://docs.python.org/3/library/stdtypes.html#index-13) (optional) Timeout value in seconds for the connection with the backend API service. - `wait_for_network_idle` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to wait for network reaching full idle state before querying the page. If set to `False`, this method will only check for whether page has emitted `load` event (https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event). Default is `True`. - `include_hidden` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to include hidden elements on the page. Defaults to `False`. - `mode` ResponseMode (#responsemode) (optional): The mode of the query. It can be either `standard` or `fast`. Defaults to `fast` mode. - `experimental_query_elements_enabled` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to use the experimental implementation of the query elements feature. Defaults to `False`. #### Returns - AQLResponseProxy (aqlresponse) The AgentQL response object with elements that match the query. Response provides access to requested elements via its fields. --- ### query_data Queries the web page for data that matches the AgentQL query, such as blocks of text or numbers. #### Usage ```python filename="agentql_example.py" retrieved_data = page.query_data( query=""" { products[] { name price(integer) } } """ ) print(retrieved_data) ``` ```python filename="agentql_example.py" retrieved_data = await page.query_data( query=""" { products[] { name price(integer) } } """ ) print(retrieved_data) ``` #### Arguments - `query` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) An AgentQL query (/agentql-query/query-intro) in String format. - `timeout` int (https://docs.python.org/3/library/stdtypes.html#index-13) (optional) Timeout value in seconds for the connection with backend API service. - `wait_for_network_idle` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to wait for network reaching full idle state before querying the page. If set to `False`, this method will only check for whether page has emitted `load` event (https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event). Default is `True`. - `include_hidden` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to include hidden elements on the page. Defaults to `True`. - `mode` ResponseMode (#responsemode) (optional): The mode of the query. It can be either `standard` or `fast`. Defaults to `fast` mode. #### Returns - dict (https://docs.python.org/3/library/stdtypes.html#mapping-types-dict) Data that matches the query. --- ### wait_for_page_ready_state Waits for the page to reach the "Page Ready" state, that is page has entered a relatively stable state and most main content is loaded. Might be useful before triggering an AgentQL query or any other interaction for slowly rendering pages. #### Usage ```python filename="agentql_example.py" page.wait_for_page_ready_state() ``` ```python filename="agentql_example.py" await page.wait_for_page_ready_state() ``` #### Arguments - `wait_for_network_idle` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to wait for network reaching full idle state. If set to `False`, this method will only check for whether page has emitted `load` event (https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event). Default is `True`. #### Returns - NoneType (https://docs.python.org/3/library/constants.html#None) --- ### enable_stealth_mode Enables "stealth mode" with given configuration. To avoid being marked as a bot, parameters' values should match the real values used by your device. Use browser fingerprinting websites such as bot.sannysoft.com (https://bot.sannysoft.com/) and pixelscan.net (https://pixelscan.net/) for realistic examples. #### Usage ```python filename="agentql_example.py" page.enable_stealth_mode( webgl_vendor=your_browser_vendor, webgl_renderer=your_browser_renderer, nav_user_agent=navigator_user_agent, ) ``` ```python filename="agentql_example.py" await page.enable_stealth_mode( webgl_vendor=your_browser_vendor, webgl_renderer=your_browser_renderer, nav_user_agent=navigator_user_agent, ) ``` #### Arguments - `webgl_vendor` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) (optional) The vendor of the GPU used by WebGL to render graphics, such as `Apple Inc.`. After setting this parameter, your browser will emit this vendor information. - `webgl_renderer` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) (optional) Identifies the specific GPU model or graphics rendering engine used by WebGL, such as `Apple M3`. After setting this parameter, your browser will emit this renderer information. - `nav_user_agent` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) (optional) Identifies the browser, its version, and the operating system, such as `Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36`. After setting this parameter, your browser will send this user agent information to the website. #### Returns - NoneType (https://docs.python.org/3/library/constants.html#None) --- ### `get_pagination_info` Returns pagination information and status of the current page. #### Usage ```python filename="pagination_example.py" pagination_info = page.get_pagination_info() ``` ```python filename="pagination_example.py" pagination_info = await page.get_pagination_info() ``` #### Arguments - `timeout` int (https://docs.python.org/3/library/stdtypes.html#index-13) (optional): Timeout value in seconds for the connection with backend API service for querying the pagination information. - `wait_for_network_idle` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to wait for network reaching full idle state before querying the page for pagination information. If set to `False`, this method will only check for whether page has emitted `load` event (https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event). Default is `True`. - `include_hidden` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to include hidden elements on the page when querying for pagination information. Defaults to `False`. - `mode` ResponseMode (#responsemode) (optional): The mode of the query for retrieving the pagination information. It can be either `standard` or `fast`. Defaults to `fast` mode. #### Returns - PaginationInfo (paginationinfo) The PaginationInfo object provide access to the pagination availability and functionality to navigate to the next page. --- ### `get_last_query` Returns the last query executed by the AgentQL SDK on this page. #### Usage ```python filename="agentql_example.py" last_query = page.get_last_query() ``` #### Returns - str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) The last query executed by the AgentQL SDK on this page. --- ### `get_last_response` Returns the last response generated by the AgentQL SDK on this page. #### Usage ```python filename="agentql_example.py" last_response = page.get_last_response() ``` #### Returns - dict (https://docs.python.org/3/library/stdtypes.html#mapping-types-dict) The last response generated by the AgentQL SDK on this page. --- ### `get_last_accessibility_tree` Returns the last accessibility tree generated by the AgentQL SDK on this page. #### Usage ```python filename="agentql_example.py" last_accessibility_tree = page.get_last_accessibility_tree() ``` #### Returns - dict (https://docs.python.org/3/library/stdtypes.html#mapping-types-dict) The last accessibility tree generated by the AgentQL SDK on this page. --- ## Types ### `ResponseMode` The `ResponseMode` type specifies the mode of querying for `query_elements()` (#query_elements), `query_data()` (#query_data), and `get_by_prompt()` (#get_by_prompt) methods. It's expecting the following two values: - `standard` Executes the query in Standard Mode. Use this mode when your queries are complex or extensive data retrieval is necessary. - `fast` Executes the query more quickly, potentially at the cost of response accuracy. This mode is useful in situations where speed is prioritized, and the query is straightforward. ###### `AQLResponseProxy` class Source: https://docs.agentql.com/python-sdk/api-references/aqlresponse The AgentQL's `Page`'s `query_elements()` method (/python-sdk/api-references/agentql-page#queryelements) returns the `AQLResponseProxy` class. **Not the actual data**, AQLResponseProxy is a metadata that allows for intuitive access to web elements using dot notation. But users can convert this class into raw data as structured dictionary through its `to_data()` method. To access desired web elements, users can directly use the names defined in queries as attributes of the response object. It returns desired elements as Playwright Locator (https://playwright.dev/python/docs/api/class-locator) objects, and users can interact with these elements, such as click or type, through **Playwright Locator API**. The following example queries for web elements through `query_elements()` method, interacts with these elements through `AQLResponseProxy` objects, and converts `AQLResponseProxy` objects into raw data. ```python filename="agentql_example.py" import agentql from playwright.sync_api import sync_playwright QUERY = """ { search_box header { search_btn } } """ with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto("https://duckduckgo.com") # Get AQLResponseProxy, which contains desired web elements aql_response = page.query_elements(QUERY) # [!code highlight] # Access the elements with dot notation and interact with them as Playwright Locator objects aql_response.search_box.type("AgentQL") # [!code highlight] # To access a nested elements in query, simply chain attributes together with dot notation aql_response.header.search_btn.click() # [!code highlight] # Convert response into raw data as structured dictionary with to_data() method raw_data_in_dict = aql_response.to_data() # [!code highlight] print(raw_data_in_dict) # Used only for demo purposes. It allows you to see the effect of the script. page.wait_for_timeout(10000) ``` ```python filename="agentql_example.py" import asyncio import agentql from playwright.async_api import async_playwright QUERY = """ { search_box header { search_btn } } """ async def main(): async with async_playwright() as playwright, await playwright.chromium.launch(headless=False) as browser: page = await agentql.wrap_async(browser.new_page()) await page.goto("https://duckduckgo.com") # Get AQLResponseProxy, which contains desired web elements aql_response = await page.query_elements(QUERY) # [!code highlight] # Access the elements with dot notation and interact with them as Playwright Locator objects await aql_response.search_box.type("AgentQL") # [!code highlight] # To access a nested element in query, simply chain attributes together with dot notation await aql_response.header.search_btn.click() # [!code highlight] # Convert response into raw data as structured dictionary with to_data() method raw_data_in_dict = await aql_response.to_data() # [!code highlight] print(raw_data_in_dict) # Used only for demo purposes. It allows you to see the effect of the script. await page.wait_for_timeout(10000) asyncio.run(main()) ``` --- ## Methods ### to_data Converts the response data into a structured dictionary based on the query tree. #### Usage ```python filename="agentql_example.py" aql_response = page.query_elements(QUERY) aql_response.to_data() ``` ```python filename="agentql_example.py" aql_response = await page.query_elements(QUERY) await aql_response.to_data() ``` #### Returns - dict (https://docs.python.org/3/library/stdtypes.html#dict) A structured Python dictionary in the following format. ```python filename="agentql_example.py" { "query_field": "text content of the corresponding web element" } ``` --- ### \_\_getattr\_\_ If this method is called on an innermost node of the query, it returns the desired web element as a Playwright Locator (https://playwright.dev/python/docs/api/class-locator) object. Please check the Playwright API reference (https://playwright.dev/python/docs/api/class-locator) to see available methods in `Locator` class. If called on a container node of the query, it returns another `AQLResponseProxy` object, which can be further interacted to get `Playwright Locator` object. #### Usage ```python filename="agentql_example.py" QUERY = """ { search_btn search_results[] } """ aql_response = page.query_elements(QUERY) # This invokes Playwright Locator object's click() method aql_response.search_btn.click() # This iterate through search result with AQLResponseProxy for search_result in aql_response.search_results: print(search_result) ``` ```python filename="agentql_example.py" QUERY = """ { search_btn search_results[] } """ aql_response = await page.query_elements(QUERY) # This invokes Playwright Locator object's click() method await aql_response.search_btn.click() # This iterate through search result with AQLResponseProxy for search_result in aql_response.search_results: print(search_result) ``` #### Arguments - `name` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) The name of the attribute to retrieve. #### Returns - AQLResponseProxy (aqlresponse) | Playwright Locator (https://playwright.dev/python/docs/api/class-locator) --- ### \_\_getitem\_\_ Allows indexing into the response data if it's a list. #### Usage ```python filename="agentql_example.py" QUERY = """ { search_results[] } """ aql_response = page.query_elements(QUERY) # Get the second result in the list second_result = aql_response.search_results[1] ``` ```python filename="agentql_example.py" QUERY = """ { search_results[] } """ aql_response = await page.query_elements(QUERY) # Get the second result in the list second_result = aql_response.search_results[1] ``` #### Arguments - `index` int (https://docs.python.org/3/library/stdtypes.html#index-13) The index of the item in the list to retrieve. #### Returns - AQLResponseProxy (aqlresponse) | Playwright Locator (https://playwright.dev/python/docs/api/class-locator) The corresponding `AQLResponseProxy` list item. --- ### \_\_len\_\_ Returns the number of items in the response data if it's a list. #### Usage ```python filename="agentql_example.py" QUERY = """ { search_results[] } """ aql_response = page.query_elements(QUERY) # Get the number of search results result_count = len(aql_response.search_results) ``` ```python filename="agentql_example.py" QUERY = """ { search_results[] } """ aql_response = await page.query_elements(QUERY) # Get the number of search results result_count = len(aql_response.search_results) ``` #### Returns - int (https://docs.python.org/3/library/stdtypes.html#index-13) Number of items in the list. --- ### \_\_str\_\_ Returns a string representation of the response data in JSON format. #### Usage ```python filename="agentql_example.py" QUERY = """ { search_results[] } """ aql_response = page.query_elements(QUERY) # Convert results to a JSON string for display result_in_json_string = print(aql_response.search_results) ``` ```python filename="agentql_example.py" QUERY = """ { search_results[] } """ aql_response = await page.query_elements(QUERY) # Convert results to a JSON string for display result_in_json_string = print(aql_response.search_results) ``` #### Returns - str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) ###### `agentql.tools` module Source: https://docs.agentql.com/python-sdk/api-references/agentql-tools The `agentql.tools` module provides utility methods to help with data extraction and web automation. * Paginate (#paginate) Collect data from multiple pages * Query Document (#query-document) Query documents (.pdf, .png, .jpg, .jpeg) using AgentQL. ## Paginate The following example demonstrates how to use the `paginate` method to collect data from multiple pages: ```python filename="agentql_pagination_example.py" import agentql from agentql.tools.sync_api import paginate # [!code highlight] from playwright.sync_api import sync_playwright with sync_playwright() as p, p.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto("https://news.ycombinator.com/") # Define the query to extract the titles of the posts QUERY = """ { posts[] { title } } """ # Collect data from the first 3 pages using the query paginated_data = paginate(page, QUERY, 3) # [!code highlight] print(paginated_data) ``` ```python filename="agentql_pagination_example.py" import asyncio import agentql from agentql.tools.async_api import paginate # [!code highlight] from playwright.async_api import async_playwright async def main(): async with async_playwright() as p, await p.chromium.launch(headless=False) as browser: page = await agentql.wrap_async(browser.new_page()) await page.goto("https://news.ycombinator.com/") # Define the query to extract the titles of the posts QUERY = """ { posts[] { title } } """ # Collect data from the first 3 pages using the query paginated_data = await paginate(page, QUERY, 3) # [!code highlight] print(paginated_data) asyncio.run(main()) ``` ### Methods #### Paginate Collects data from multiple pages using an AgentQL query. Internally, the function first attempts to find the operable element to navigate to the next page and click it, then uses the provided query to extract the data from the page. The function then repeats this process for the specified number of pages. The `paginate` function returns data collected from each page into a single, aggregated list. If you wish to step through each page's data, use the `navigate_to_next_page` (/python-sdk/api-references/paginationinfo#navigatetonextpage) method instead. --- ### Usage ```python filename="paginate_example.py" paginated_data = paginate(page, QUERY, 3) ``` ```python filename="paginate_example.py" paginated_data = await paginate(page, QUERY, 3) ``` ### Arguments - `page` AgentQL Page (agentql-page) The AgentQL Page object. - `query` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) An AgentQL query in String format. - `number_of_pages` int (https://docs.python.org/3/library/stdtypes.html#index-13) Number of pages to paginate over. - `timeout` int (https://docs.python.org/3/library/stdtypes.html#index-13) (optional): Timeout value in seconds for the connection with backend API service for querying the pagination element. - `wait_for_network_idle` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to wait for network reaching full idle state before querying the page for pagination element. If set to `False`, this method will only check for whether page has emitted `load` event (https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event). Default is `True`. - `include_hidden` bool (https://docs.python.org/3/library/stdtypes.html) (optional) Whether to include hidden elements on the page when querying for pagination element. Defaults to `False`. - `mode` ResponseMode (#responsemode) (optional): The mode of the query for retrieving the pagination element. It can be either `standard` or `fast`. Defaults to `fast` mode. - `force_click` bool (https://docs.python.org/3/library/stdtypes.html) (optional): Whether to `force` click (https://playwright.dev/python/docs/input#forcing-the-click) on the pagination element. Defaults to `False`. ### Returns - [List[dict]](https://docs.python.org/3/library/stdtypes.html#list) List of dictionaries containing the data from each page. --- ## Query document The following example demonstrates how to use the `query_document` method to query a document (.pdf, .png, .jpg, .jpeg) using AgentQL. The `query_document` function consumes 1 api call per image (JPG, JPEG, JPG), and 1 api call for each page within a PDF. ```python filename="query_document_example.py" from agentql.tools.sync_api import query_document QUERY = """ { name } """ file_path = "path/to/file.pdf" async def main(): response = query_document( file_path, query=QUERY, ) print(f"name: {response['name']}") main() ``` ```python filename="query_document_example.py" import asyncio from agentql.tools.async_api import query_document QUERY = """ { name } """ file_path = "path/to/file.pdf" async def main(): response = await query_document( file_path, query=QUERY, ) print(f"name: {response['name']}") asyncio.run(main()) ``` ### Methods #### query_document Queries a document (.pdf, .png, .jpg, .jpeg) using AgentQL. --- ### Usage ```python filename="query_document_example.py" response = query_document( file_path, query=QUERY, ) ``` ```python filename="query_document_example.py" response = await query_document( file_path, query=QUERY, ) ``` ### Arguments - `file_path` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) The path to the document to query. - `query` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) The query to execute on the document. Either query or prompt must be provided but not both. - `prompt` str (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) The prompt to execute on the document. Either query or prompt must be provided but not both. - `timeout` int (https://docs.python.org/3/library/stdtypes.html#index-13) (optional): Timeout value in seconds for the connection with backend API service. - `mode` ResponseMode (#responsemode) (optional): The mode of the query. It can be either `standard` or `fast`. Defaults to `fast` mode. ### Returns - dict (https://docs.python.org/3/library/stdtypes.html#mapping-types-dict) Data that matches the query. --- ## Types ### `ResponseMode` The `ResponseMode` type specifies the mode of querying for `query_elements()` (#query_elements), `query_data()` (#query_data), and `get_by_prompt()` (#get_by_prompt) methods. It's expecting the following two values: - `standard` Executes the query in Standard Mode. Use this mode when your queries are complex or extensive data retrieval is necessary. - `fast` Executes the query more quickly, potentially at the cost of response accuracy. This mode is useful in situations where speed is prioritized, and the query is straightforward. ###### `PaginationInfo` class Source: https://docs.agentql.com/python-sdk/api-references/paginationinfo AgentQL's `Page`'s `get_pagination_info()` method (/python-sdk/api-references/agentql-page#getpaginationinfo) returns the `PaginationInfo` class. `PaginationInfo` provides access to the pagination availability and the ability to navigate to the next page. The following example queries for pagination information, checks if there is a next page, and navigates to the next page if possible. ```python filename="pagination_example.py" import agentql from playwright.sync_api import sync_playwright with sync_playwright() as p, p.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto("https://scrapeme.live/shop/page/2/") log.debug("Navigating to next page...") pagination_info = page.get_pagination_info() # [!code highlight] # attempt to navigate to next page if pagination_info.has_next_page: # [!code highlight] pagination_info.navigate_to_next_page() # [!code highlight] page.wait_for_timeout(1000) ``` ```python filename="pagination_example.py" import asyncio import agentql from playwright.async_api import async_playwright async def main(): async with async_playwright() as p, await p.chromium.launch(headless=False) as browser: page = await agentql.wrap_async(browser.new_page()) await page.goto("https://scrapeme.live/shop/page/2/") log.debug("Navigating to next page...") pagination_info = await page.get_pagination_info() # [!code highlight] # attempt to navigate to next page if pagination_info.has_next_page: # [!code highlight] await pagination_info.navigate_to_next_page() # [!code highlight] await page.wait_for_timeout(1000) asyncio.run(main()) ``` --- ## Methods ### navigate_to_next_page Navigates to the next page. #### Usage ```python filename="pagination_example.py" pagination_info = page.get_pagination_info() pagination_info.navigate_to_next_page() ``` ```python filename="pagination_example.py" pagination_info = await page.get_pagination_info() await pagination_info.navigate_to_next_page() ``` #### Returns - NoneType (https://docs.python.org/3/library/constants.html#None) Always check if there is a next page before navigating to it by using the `has_next_page` property. --- ## Properties ### has_next_page Detects if there is a next page. #### Usage ```python filename="pagination_example.py" pagination_info = page.get_pagination_info() if pagination_info.has_next_page: pagination_info.navigate_to_next_page() ``` ```python filename="pagination_example.py" pagination_info = await page.get_pagination_info() if pagination_info.has_next_page: await pagination_info.navigate_to_next_page() ``` #### Returns - bool (https://docs.python.org/3/library/stdtypes.html#bool) ### JavaScript SDK Source: https://docs.agentql.com/javascript-sdk AgentQL's JavaScript SDK allows for automation as well as data extraction with a JavaScript integration with Playwright. ## Overview AgentQL's JavaScript SDK allows for automation as well as data extraction with a JavaScript integration with Playwright. ## References Check out the Release Notes (/release-notes) for updates and new features. ## Related content #### Installation Source: https://docs.agentql.com/javascript-sdk/installation ## Prerequisites - Node.js 18 or higher ## Installation options There are two ways to install AgentQL SDK: - AgentQL CLI installation (#option-1-agentql-cli-installation) — Get set up fast by installing the AgentQL library and then using AgentQL CLI to download dependencies. - Manual installation (#option-2-manual-installation) — For a more customized setup, manually install the AgentQL SDK. ### Option 1: AgentQL CLI Installation From your terminal, run the following command to install the AgentQL library: #### 1. Install AgentQL library and CLI ```bash npm install agentql npm install -g agentql-cli ``` #### 2. Install dependencies and set API key The following AgentQL CLI command will install Playwright dependencies and an example script. ```bash agentql init ``` #### 3. Set your AgentQL API Key Set the `AGENTQL_API_KEY` environment variable with your API key (https://dev.agentql.com/). To set the environment variable temporarily for your terminal session, in your terminal run ```bash export AGENTQL_API_KEY=your-api-key ``` #### Powershell If you are using **Powershell** as your terminal, you can set the environment variable with the following command ```bash $env:AGENTQL_API_KEY="your-api-key" ``` #### Command Prompt If you are using **Command Prompt** as your terminal, you can set the environment variable with the following command ```bash set AGENTQL_API_KEY=your-api-key ``` ### Option 2: Manual Installation From your terminal, run the following command to install the AgentQL library: #### 1. Install AgentQL library ```bash npm install agentql ``` #### 2. Install Playwright The default version of AgentQL JavaScript SDK uses Playwright as a web driver, so Playwright dependencies need to be installed. ```bash npx playwright install chromium ``` #### 3. Set your AgentQL API Key Set the `AGENTQL_API_KEY` environment variable with your API key (https://dev.agentql.com/). To set the environment variable temporarily for your terminal session, in your terminal run ```bash export AGENTQL_API_KEY=your-api-key ``` #### Powershell If you are using **Powershell** as your terminal, you can set the environment variable with the following command ```bash $env:AGENTQL_API_KEY="your-api-key" ``` #### Command Prompt If you are using **Command Prompt** as your terminal, you can set the environment variable with the following command ```bash set AGENTQL_API_KEY=your-api-key ``` ## Run Your First AgentQL Script Now you are ready to run your first AgentQL script! Continue to First Steps (/getting-started/first-steps) to get started. #### API Reference Source: https://docs.agentql.com/javascript-sdk/api-references AgentQL's JavaScript SDK API references for data extraction and web automation. The AgentQL JavaScript SDK ships with a range of modules and classes to help you automate interactions with and parse, extract, and scrape data from web pages quickly and at scale. * `agentql` module (/javascript-sdk/api-references/agentql) provides utility methods to set configurations, including the API key, and to convert Playwright's `Page` to AgentQL's `Page` (/javascript-sdk/api-references/agentql-page), which gives access to AgentQL's querying API. * AgentQL `Page` (/javascript-sdk/api-references/agentql-page) is a wrapper around Playwright's `Page` that provides access to AgentQL's querying API. * `AQLResponseProxy` (/javascript-sdk/api-references/aqlresponse) is returned by the AgentQL's `Page`'s `queryElements()` method (/javascript-sdk/api-references/agentql-page#queryelements). It's not the actual data but a metadata structure that allows intuitive access to web elements using dot notation. Check out the Release Notes (/release-notes) for updates and new features. ## Related content ##### API Reference Source: https://docs.agentql.com/javascript-sdk/api-references AgentQL's JavaScript SDK API references for data extraction and web automation. The AgentQL JavaScript SDK ships with a range of modules and classes to help you automate interactions with and parse, extract, and scrape data from web pages quickly and at scale. * `agentql` module (/javascript-sdk/api-references/agentql) provides utility methods to set configurations, including the API key, and to convert Playwright's `Page` to AgentQL's `Page` (/javascript-sdk/api-references/agentql-page), which gives access to AgentQL's querying API. * AgentQL `Page` (/javascript-sdk/api-references/agentql-page) is a wrapper around Playwright's `Page` that provides access to AgentQL's querying API. * `AQLResponseProxy` (/javascript-sdk/api-references/aqlresponse) is returned by the AgentQL's `Page`'s `queryElements()` method (/javascript-sdk/api-references/agentql-page#queryelements). It's not the actual data but a metadata structure that allows intuitive access to web elements using dot notation. Check out the Release Notes (/release-notes) for updates and new features. ## Related content ###### `agentql` module Source: https://docs.agentql.com/javascript-sdk/api-references/agentql The `agentql` module provides utility methods to set configurations, including the API key, and to convert Playwright's `Page` to AgentQL's `Page` (/javascript-sdk/api-references/agentql-page), which gives access to AgentQL's querying API. The following example demonstrates how to set the API key and wrap a Playwright `Page` object to use AgentQL's querying capabilities: ```javascript filename="agentql_example.js" const { wrap, configure } = require('agentql'); // [!code highlight] const { chromium } = require('playwright'); (async () => { // Configure the API key configure({ apiKey: process.env.AGENTQL_API_KEY }); const browser = await chromium.launch({ headless: false }); const page = await wrap(await browser.newPage()); // Wraps the Playwright Page to access AgentQL's features. // [!code highlight] await page.goto('https://duckduckgo.com'); const QUERY = ` { search_box search_btn } `; const response = await page.queryElements(QUERY); await response.search_box.type('AgentQL'); await response.search_btn.click(); // Used only for demo purposes. It allows you to see the effect of the script. await page.waitForTimeout(10000); await browser.close(); })(); ``` --- ## Methods ### wrap Wraps a Playwright `Page` instance to add AgentQL's querying methods. See AgentQL Page (agentql-page) reference for API details. #### Usage ```typescript filename="agentql_example.ts" const agentqlPage = await wrap(await browser.newPage()); ``` #### Arguments - `page` Playwright's Page (https://playwright.dev/docs/api/class-page) #### Returns - Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<AgentQL Page (agentql-page)> - The wrapped Page object with AgentQL extensions. --- ### configure Configures the AgentQL service. #### Usage ```javascript filename="agentql_example.js" configure({ apiKey: 'YOUR_API_KEY' }); ``` #### Arguments - `options` (object) - The configuration options. Supports the following options: - `apiKey` (string) - The API key for the AgentQL service. #### Returns - void (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/undefined) ###### AgentQL `Page` Source: https://docs.agentql.com/javascript-sdk/api-references/agentql-page AgentQL `Page` is a wrapper around Playwright's `Page` that provides access to AgentQL's querying API. The following example creates a Playwright's page, navigates it to a URL, and queries for WebElements using AgentQL: ```javascript filename="agentql_example.js" const { chromium } = require('playwright'); const { wrap, configure } = require('agentql'); const QUERY = ` { search_box search_btn } `; (async () => { // Configure the API key configure({ apiKey: process.env.AGENTQL_API_KEY }); const browser = await chromium.launch({ headless: false }); const page = await wrap(await browser.newPage()); // Wrapped to access AgentQL's query API's await page.goto('https://duckduckgo.com'); // [!code highlight] const response = await page.queryElements(QUERY); // [!code highlight] await response.search_box.type('AgentQL'); await response.search_btn.click(); // Used only for demo purposes. It allows you to see the effect of the script. await page.waitForTimeout(10000); await browser.close(); })(); ``` --- ## Methods ### getByPrompt Returns a single web element located by a natural language prompt (as opposed to an AgentQL query). #### Usage ```javascript filename="agentql_example.js" const searchBox = await page.getByPrompt('Search input field'); ``` #### Arguments - `prompt` `string` The natural language description of the element to locate. - `options` `object` (optional) Optional parameters for the query. - `timeout` `number` (optional) Timeout value in milliseconds for the connection with backend API service. Defaults to 60 seconds. - `waitForNetworkIdle` `boolean` (optional) Whether to wait for network reaching full idle state before querying the page. If set to `False`, this method will only check for whether page has emitted `load` event (https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event). Default is `True`. - `includeHidden` `boolean` (optional) Whether to include hidden elements on the page. Defaults to `false`. - `mode` `ResponseMode` (optional) The mode of the query. It can be either `'standard'` or `'fast'`. Defaults to `fast` mode. #### Returns - Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<Locator (https://playwright.dev/docs/api/class-locator) | null (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/null)> Playwright Locator for the found element or `null` if no matching elements were found. --- ### getLastQuery Returns the last query executed by the AgentQL SDK on this page. #### Usage ```javascript filename="agentql_example.js" const lastQuery = await page.getLastQuery(); ``` #### Returns - Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<string (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String)> | null (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/null) The last query executed by the AgentQL SDK on this page. --- ### getLastResponse Returns the last response generated by the AgentQL SDK on this page. #### Usage ```javascript filename="agentql_example.js" const lastResponse = await page.getLastResponse(); ``` #### Returns - Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<Record<string, any> (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object)> | null (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/null) The last response generated by the AgentQL SDK on this page. --- ### getLastAccessibilityTree Returns the last accessibility tree generated by the AgentQL SDK on this page. #### Usage ```javascript filename="agentql_example.js" const lastAccessibilityTree = await page.getLastAccessibilityTree(); ``` #### Returns - Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<string (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String)> | null (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/null) The last accessibility tree generated by the AgentQL SDK on this page. --- ### queryElements Queries the web page for multiple web elements that match the AgentQL query. #### Usage ```javascript filename="agentql_example.js" const agentqlResponse = await page.queryElements(` { search_box search_btn } `); console.log(agentqlResponse.toData()); ``` #### Arguments - `query` `string` An AgentQL query in String format. - `options` `object` (optional) Optional parameters for the query. - `timeout` `number` (optional) Timeout value in milliseconds for the connection with the backend API service. Defaults to 60 seconds. - `waitForNetworkIdle` `boolean` (optional) Whether to wait for network reaching full idle state before querying the page. If set to `False`, this method will only check for whether page has emitted `load` event (https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event). Default is `True`. - `includeHidden` `boolean` (optional) Whether to include hidden elements on the page. Defaults to `false`. - `mode` `ResponseMode` (optional) The mode of the query. It can be either `'standard'` or `'fast'`. Defaults to `fast` mode. #### Returns - Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<AQLResponseProxy (aqlresponse)> The AgentQL response object with elements that match the query. Response provides access to requested elements via its fields. --- ### queryData Queries the web page for data that matches the AgentQL query, such as blocks of text or numbers. #### Usage ```javascript filename="agentql_example.js" const retrievedData = await page.queryData(` { products[] { name price(integer) } } `); console.log(retrievedData); ``` #### Arguments - `query` `string` An AgentQL query in String format. - `options` `object` (optional) Optional parameters for the query. - `timeout` `number` (optional) Timeout value in milliseconds for the connection with backend API service. Defaults to 900 seconds. - `waitForNetworkIdle` `boolean` (optional) Whether to wait for network reaching full idle state before querying the page. If set to `False`, this method will only check for whether page has emitted `load` event (https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event). Default is `True`. - `includeHidden` `boolean` (optional) Whether to include hidden elements on the page. Defaults to `true`. - `mode` `ResponseMode` (optional) The mode of the query. It can be either `'standard'` or `'fast'`. Defaults to `fast` mode. #### Returns - Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<Record<string, any> (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object)> Data that matches the query. --- ### waitForPageReadyState Waits for the page to reach the "Page Ready" state, that is page has entered a relatively stable state and most main content is loaded. Might be useful before triggering an AgentQL query or any other interaction for slowly rendering pages. #### Usage ```javascript filename="agentql_example.js" await page.waitForPageReadyState(); ``` #### Arguments - `options` `object` (optional) Optional parameters for the query. - `waitForNetworkIdle` `boolean` (optional) Whether to wait for network reaching full idle state. Defaults to `true`. #### Returns - Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<void (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/undefined)> --- ## Types ### `ResponseMode` The `ResponseMode` type specifies the mode of querying for `queryElements()`, `queryData()`, and `getByPrompt()` methods. It expects the following two values: - `standard` Executes the query in Standard Mode. Use this mode when your queries are complex or extensive data retrieval is necessary. - `fast` Executes the query more quickly, potentially at the cost of response accuracy. This mode is useful in situations where speed is prioritized, and the query is straightforward. ###### `AQLResponseProxy` class Source: https://docs.agentql.com/javascript-sdk/api-references/aqlresponse The `AQLResponseProxy` class is returned by the AgentQL's `Page`'s `queryElements()` method (/javascript-sdk/api-references/agentql-page#queryelements). It's not the actual data but a metadata structure that allows intuitive access to web elements using dot notation. You can convert this class into raw data as a structured dictionary using its `toData()` method. To access desired web elements, users can directly use the names defined in queries as attributes of the response object. It returns desired elements as Playwright Locator (https://playwright.dev/docs/api/class-locator), and users can interact with these elements, such as click or type, through the **Playwright Locator API**. The following example queries for web elements through the `queryElements()` method, interacts with these elements through `AQLResponseProxy` objects, and converts `AQLResponseProxy` objects into raw data. ## Example Usage ```javascript const { configure, wrap } = require('agentql'); const { chromium } = require('playwright'); const QUERY = ` { search_box header { search_btn } } `; (async () => { // Configure the API key configure({ apiKey: process.env.AGENTQL_API_KEY }); const browser = await chromium.launch({ headless: false }); const page = await wrap(await browser.newPage()); await page.goto('https://duckduckgo.com'); // Get AQLResponseProxy, which contains desired web elements const aqlResponse = await page.queryElements(QUERY); // [!code highlight] // Access the elements with dot notation and interact with them as Playwright Locator objects await aqlResponse.search_box.type('AgentQL'); // [!code highlight] // To access a nested element in query, simply chain attributes together with dot notation await aqlResponse.header.search_btn.click(); // [!code highlight] // Convert response into raw data as structured map with toData() method const rawDataInDict = await aqlResponse.toData(); // [!code highlight] console.log(rawDataInDict); // Used only for demo purposes. It allows you to see the effect of the script. await page.waitForTimeout(10000); await browser.close(); })(); ``` ## Methods ### toData Converts the response data into a structured map based on the query tree. #### Usage ```javascript const aqlResponse = await page.queryElements(QUERY); const data = await aqlResponse.toData(); ``` #### Returns - Promise (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<Record<string, any> (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object)>: A structured JavaScript object in the following format: ```javascript { "query_field": "text content of the corresponding web element" } ``` --- ### getAttribute This method is used to access attributes of the response object. If called on an innermost node of the query, it returns the desired web element as a Playwright Locator (https://playwright.dev/docs/api/class-locator). Please check the Playwright Locator API (https://playwright.dev/docs/api/class-locator) for available methods. If called on a container node of the query, it returns another `AQLResponseProxy` object. #### Usage ```javascript const QUERY = ` { search_btn search_results[] } `; const aqlResponse = await page.queryElements(QUERY); // This invokes Playwright Locator object's click() method await aqlResponse.search_btn.click(); // This iterates through search results with AQLResponseProxy for (const searchResult of aqlResponse.search_results) { console.log(searchResult); } ``` #### Arguments - `name` String (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String): The name of the attribute to access. #### Returns - AQLResponseProxy (aqlresponse) | Playwright Locator (https://playwright.dev/docs/api/class-locator) The corresponding AQLResponseProxy or Playwright Locator object. --- ### getItem Allows indexing into the response data if it's an array. #### Usage ```javascript const QUERY = ` { search_results[] } `; const aqlResponse = await page.queryElements(QUERY); // Get the second result in the list const secondResult = aqlResponse.search_results[1]; ``` #### Returns - AQLResponseProxy (aqlresponse) | Playwright Locator (https://playwright.dev/docs/api/class-locator) The corresponding AQLResponseProxy or Playwright Locator object at the specified index. --- ### getLength Returns the number of items in the response data if it's an array. #### Usage ```javascript const QUERY = ` { search_results[] } `; const aqlResponse = await page.queryElements(QUERY); // Get the number of search results const resultCount = aqlResponse.search_results.length; ``` #### Returns - number (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number) The number of items in the array. # Integrations Source: https://docs.agentql.com/integrations AgentQL integrates with a wide range of tools and services to help you extract data from web pages. ## Overview AgentQL connects your favorite Agentic frameworks, low-code platforms, and automation tools to live content on the web. ## Integrations ## AgentStack Source: https://docs.agentql.com/integrations/agentstack AgentQL integrates with AgentStack, allowing you to scaffold agent projects with AgentQL and other tools. AgentStack is a developer tool that makes building agents efficient and fast with just a few CLI commands. AgentQL integrates with AgentStack, allowing you to scaffold agent projects with AgentQL and other tools. AgentStack is a developer tool that makes building agents efficient and fast with just a few CLI commands. ## Set up AgentQL with AgentStack Before you start, install AgentStack and get your AgentQL API key (https://dev.agentql.com/api-keys). 1. After installing AgentStack, start a new project by running the following in your terminal: ```bash agentstack init ``` 2. To create a new agent, you can either get started by selecting a template or, if you'd like to start on an empty project, run the following: ```bash agentstack generate agent ``` In your newly added agent, there is an `agent.yaml` file where you can provide descriptions for each agent's role, goal, and backstory to configure its behavior. Learn more about Agents in AgentStack's documentation. 3. Create a new task by running the following: ```bash agentstack generate task ``` Similar to AgentStack's agents, there is also a `task.yaml` file to fill out each task's description, expected output, and the agent that does the task. Learn more about Tasks in AgentStack's documentation. You can create multiple tasks and agents by running those previous commands multiple times. 4. Add AgentQL as a tool for your agent project by running the following: ```bash agentstack tools add agentql ``` 5. In the `.env` file in your project, add your AgentQL API key as an environment variable `AGENTQL_API_KEY=`. Once that's done you can start creating your first agent! You can also use AgentStack's examples to get started. ## Usage Once you finish building your project, run it with the following command: ```bash agentstack run ``` When your project is running, the agent decides which available tools to use to perform each of its respective tasks. ### Support If you have any questions about using AgentQL with AgentStack, you can join AgentStack's Discord here. ## Examples - Research Assistant - Sentiment Analyzer ## Dify Source: https://docs.agentql.com/integrations/dify Simplify AI deployment with Dify. Build and scale AI applications with no-code flexibility that can retrieve and manipulate data from the web with AgentQL. AgentQL integrates with Dify, allowing you to flexibly build AI applications. Dify helps to create production-ready AI solutions, from agents to complex AI workflows with low-code and user-friendly interfaces. ## Set up AgentQL with Dify 1. Sign up for a free Dify account, from there you can jump right in. 1. Go to the Dify dashboard. 1. On the top of the dashboard, open the Tools tab 1. Search for "AgentQL". 1. Click on the AgentQL tool. 1. Click on "To authorize", provide your AgentQL API key to use the AgentQL tool, and click "Save". Generate your AgentQL API key here (https://dev.agentql.com/api-keys). ## Usage The AgentQL tool is available for use in both **workflows** and **agents.** ### Using AgentQL in an Agent For agentic usage, you'll add AgentQL's Extract Web Data tool to your agent app like so: 1. Go to the Dify dashboard. 2. Click "Create from Blank" or "Create from Template" to create a new app and choose an "agent". A workspace will open up. 3. Under Tools, click "+ Add". 4. Search for "AgentQL" 5. Select the "Extract Web Data" tool under AgentQL. Next you'll prompt the agent to extract data from a web page with either an AgentQL query (https://docs.agentql.com/agentql-query) or a Natural Language prompt. ### Using AgentQL in a Workflow For workflow usage, you'll add AgentQL's Extract Web Data tool to your pipeline like so: 1. Go to the Dify dashboard. 2. Click "Create from Blank" or "Create from Template" to create a new app and choose a chatflow or workflow. A workspace will open up. 3. On the bottom left, click the "Add Block" button. 4. Under **Tools,** search for "AgentQL" 5. Select the "Extract Web Data" tool under AgentQL. Next you'll configure input variables in the tool's UI, and run the pipeline to extract web page data. ### Support If you have any questions about using AgentQL with Dify, you can get involved in the Dify community to ask questions. ## Examples - Price Deal Finder: In the chat input, name a product that you would want to find and compare prices of. You can start off with the example below: ``` Nintendo Switch - OLed Model - w/ White Joy-Con ``` - Research Assistant: In the chat input, ask any question you have about any research topic. You can start off with the example below: ``` What are some interesting facts about black holes to research on? ``` - News Aggregator: In the chat input, provide a prompt of what you want to scrape and URL link for the website to scrape. You can start off with the example below: ``` Scrape all job postings from the following page: https://www.ycombinator.com/jobs Include columns for: Job Title | Company | Location | Job URL | Employment Type (Full-time, Part-time, Contract, etc.) | Remote Eligibility (Yes/No) ``` ## LangChain Source: https://docs.agentql.com/integrations/langchain AgentQL integrates with LangChain, allowing you to both extract data and take actions on the web. LangChain helps to perform AI-powered research, workflow automation, and hands-free online interactions. AgentQL integrates with LangChain, allowing you to both extract data and take actions on the web. LangChain helps to perform AI-powered research, workflow automation, and hands-free online interactions. ## Set up AgentQL with LangChain 1. Run the following command in your terminal to install LangChain: ```bash pip install langchain ``` 2. Install a language model for your app to get started with: ```bash pip install -qU "langchain[openai]" ``` 3. Create a new file `main.py` to instantiate your LLM containing the following: ```python filename="main.py" import getpass import os from langchain.chat_models import init_chat_model if not os.environ.get("OPENAI_API_KEY"): os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ") model = init_chat_model("gpt-4o-mini", model_provider="openai") ``` 4. Install AgentQL's tools: ```bash pip install -U langchain-agentql ``` 5. Configure AgentQL Configure the `AGENTQL_API_KEY` environment variable by running the command below in the terminal. You can get an AgentQL API key here (https://dev.agentql.com/api-keys). ```bash export AGENTQL_API_KEY= ``` 5. In the `main.py` file, import AgentQL's tools: ```python filename="main.py" from langchain_agentql.tools import ExtractWebDataTool, ExtractWebDataBrowserTool, GetWebElementBrowserTool ``` You can now start building your first app on LangChain with AgentQL! ## Usage AgentQL provides the following three tools: - ExtractWebDataTool - ExtractWebDataBrowserTool - GetWebElementBrowserTool AgentQL also provides `AgentQLBrowserToolkit` that bundles `ExtractWebDataBrowserTool` and `GetWebElementBrowserTool` together. These two tools require a Playwright browser but `ExtractWebDataTool` doesn't. `ExtractWebDataTool` calls a REST API. ### How to use AgentQL's `AgentQLBrowserToolkit` 1. Instantiate your Playwright browser instance: ```python filename="main.py" from langchain_agentql.utils import create_async_playwright_browser async_browser = await create_async_playwright_browser() ``` 2. Choose the Playwright tools you would like to use: ```python filename="main.py" from langchain_community.tools.playwright import NavigateTool, ClickTool playwright_toolkit = [ NavigateTool(async_browser=async_agent_browser), ClickTool(async_browser=async_agent_browser, visible_only=False) ] playwright_toolkit ``` You can learn more about the AgentQL tools available and their usage in the AgentQL documentation on LangChain's site. ## Support If you have any questions about using AgentQL with LangChain, you can join the LangChain community. ## Examples - Job Scraper - Price Deal Finder - Recipe Bot ## Langflow Source: https://docs.agentql.com/integrations/langflow Design intelligent workflows visually with Langflow. Integrate AgentQL’s data extraction capabilities to enrich your workflows with contextual, real-time insights. AgentQL integrates with Langflow, allowing you to prototype agents with AgentQL and other tools rapidly. Langflow is a visual builder that lets you drag-and-drop components to design workflows for sophisticated AI applications. ## Set up AgentQL with Langflow 1. Follow Langflow installation guide to get Langflow running locally. 2. Run Langflow locally. On the dashboard, click "+ New Flow" and choose a template to create a new flow from or a blank flow. 3. On the left sidebar under Components, search for `AgentQL Query Data` and drag the component into the flow. 4. Generate your AgentQL API key on the dev portal (https://dev.agentql.com/api-keys), and add it to the "AgentQL API Key" field. Once that's done you can start building your workflow with AgentQL! ## Usage You can switch on the AgentQL component's "Tool Mode" to turn it into a tool for an Agent component to extract data from web pages at its own discretion. Paste the contents of the Introduction AgentQL Query (https://docs.agentql.com/agentql-query/query-intro) into the Agent component's Agent Instructions field. This teaches the Agent how to write its own queries! Once you have completed your flow, use the "Playground" button to test it out. ### Support If you have any questions about using AgentQL with Langflow, you can join Langflow's Discord channel. ## Examples - News Aggregator - Price Deal Finder - Research Paper Assistant ## LlamaIndex Source: https://docs.agentql.com/integrations/llamaindex AgentQL integrates with LlamaIndex, allowing you to build RAG (Retrieval-Augmented Generation) models and AI assistants for AI-powered search and decision making. AgentQL integrates with LlamaIndex, allowing you to build RAG (Retrieval-Augmented Generation) models and AI assistants for AI-powered search and decision making. LlamaIndex is a leading framework for building LLM-powered agents over data with LLMs and workflows. ## Setup AgentQL with LlamaIndex 1. Start a new Python virtual environment and run the following command in your terminal to install LlamaIndex: ```bash pip install llama-index ``` 2. Since LlamaIndex uses the OpenAI `gpt-3.5-turbo` model by default, you need to set up an `OPENAI_API_KEY` as an environment variable to use the model. ```bash export OPENAI_API_KEY= ``` 3. Install AgentQL's tools: ```bash pip install llama-index-tools-agentql ``` 4. Configure the `AGENTQL_API_KEY` environment variable. You can get an AgentQL API key here (https://dev.agentql.com/api-keys). ```bash export AGENTQL_API_KEY= ``` 5. Create a new Python file `main.py` and import AgentQL's tools: ```python # for importing extract_web_data_with_rest_api from llama_index.tools.agentql import AgentQLRestAPIToolSpec # for importing extract_web_data_from_browser and get_web_element_from_browser from llama_index.tools.playwright.base import PlaywrightToolSpec ``` You can now start building your first app on LlamaIndex with AgentQL! ## Usage AgentQL provides the following three tools: - extract_web_data_with_rest_api - extract_web_data_from_browser - get_web_element_from_browser In order to use the `extract_web_data_from_browser` and `get_web_element_from_browser`, you need to have a Playwright browser instance. ### How to use AgentQL's `AgentQLBrowserToolSpec` 1. Install Playwright on LlamaIndex: ```bash pip install llama-index-tools-playwright ``` 2. Instantiate your Playwright browser instance: ```python from llama_index.tools.playwright.base import PlaywrightToolSpec async_browser = await PlaywrightToolSpec.create_async_playwright_browser() ``` 3. Choose the Playwright tools you would like to use: ```python playwright_tool_list = playwright_tool.to_tool_list() playwright_agent_tool_list = [tool for tool in playwright_tool_list if tool.metadata.name in ["click", "get_current_page", "navigate_to"]] ``` 4. Instantiate `AgentQLBrowserToolSpec`: ```python from llama_index.tools.agentql import AgentQLBrowserToolSpec agentql_browser_tool = AgentQLBrowserToolSpec(async_browser=async_browser) ``` You can learn more about the AgentQL tools available and their usage in the here. ## Support If you have any questions about using AgentQL with LlamaIndex, you can join LlamaIndex's community ## MCP Source: https://docs.agentql.com/integrations/mcp Use AgentQL with MCP to enable real-time, AI-powered web data extraction for your agents. Seamlessly collect real-time web data for use with Claude, Cursor, Windsurf, and more. AgentQL Model Context Protocol (MCP) server integrates AgentQL's data extraction capabilities with AI-powered automation, enabling seamless retrieval of structured data from web pages. It enhances AI agents by providing real-time, context-aware access to the web, supporting use cases like market monitoring, research, and automation workflows​​​. ## Installation To use AgentQL MCP Server to extract data from web pages, you need to install it via npm, get an API key, and configure it in your favorite app that supports MCP. ```bash npm install -g agentql-mcp ``` ## Set up AgentQL with MCP ### Configure Claude 1. Open Claude Desktop **Settings** via `⌘`+`,` (don't confuse with Claude Account Settings) 2. Go to **Developer** sidebar section 3. Click **Edit Config** and open `claude_desktop_config.json`. 4. Add `agentql` server inside `mcpServers` dictionary. 5. Restart the application. ```json title="claude_desktop_config.json" { "mcpServers": { "agentql": { "command": "npx", "args": ["-y", "agentql-mcp"], "env": { "AGENTQL_API_KEY": "YOUR_API_KEY" } } } } ``` Read more about MCP configuration in Claude here (https://modelcontextprotocol.io/quickstart/user). ### Configure Cursor 1. Open **Cursor Settings**. 2. Navigate to **MCP > MCP Servers**. 3. Click **+ Add new MCP Server**. 4. Enter the following: - **Name:** `agentql` - **Type:** `command` - **Command:** `env AGENTQL_API_KEY=YOUR_API_KEY npx -y agentql-mcp` Read more about MCP configuration in Cursor here (https://docs.cursor.com/context/model-context-protocol). ### Configure Windsurf 1. Open **Windsurf: MCP Configuration Panel**. 2. Click **Add custom server+**. 3. Alternatively, open `~/.codeium/windsurf/mcp_config.json` directly. 4. Add `agentql` server inside `mcpServers` dictionary. ```json title="mcp_config.json" { "mcpServers": { "agentql": { "command": "npx", "args": ["-y", "agentql-mcp"], "env": { "AGENTQL_API_KEY": "YOUR_API_KEY" } } } } ``` Read more about MCP configuration in Windsurf here (https://docs.codeium.com/windsurf/mcp). ### Configure Goose 1. Open Goose Desktop's **Advanced Settings** via the **...** menu in the upper right corner of the interface. 2. Select **Add custom extension**. 3. Fill out the form: - ID: agentql - Name: AgentQL - Description: Extract data from web pages - Command: `npx -y agentql-mcp` 4. Under **Environment Variables**, add your AgentQL key using as "AGENTQL_API_KEY" as the variable name, and your key as its value. 5. **Add** the custom extention and you're ready to go! Read more about using custom extensions with Goose here (https://block.github.io/goose/docs/getting-started/using-extensions/#discovering-extensions). ## Usage Once configured, your AI agent can extract web data using AgentQL MCP tools. Use this prompt to test it out: ```text Extract the list of videos from the page https://www.youtube.com/results?search_query=agentql, every video should have a title, an author name, a number of views and a url to the video. Make sure to exclude ads items. Format this as a markdown table. ``` > **Tip:** MCP is a new technology! If your agent complains about loading content from the web instead of using AgentQL, try adding "use tools" or "use agentql tool" in your prompt. Here are some other examples: - Get the social links from agentql.com - Return the top international news articles from ground.news into a CSV And, in our opinion AgentQL’s “killer usecase”—using Claude as a cookbook: ```bash Get this recipe for me: https://www.justonecookbook.com/pressure-cooker-japanese-curry/ ``` ## Development AgentQL MCP server is opensource and open to contribution on GitHub. ## Zapier Source: https://docs.agentql.com/integrations/zapier Use data from any site with a single AgentQL Action. Collect and act on real-time data from the Web to update databases, send emails and more with your favorite integrations. AgentQL integrates with Zapier, allowing you to automate workflows between AgentQL and other apps. Zapier lets you connect AgentQL to 8,000+ other web services. Automated connections called Zaps, set up in minutes with no coding, can automate your day-to-day tasks and build workflows between apps that otherwise wouldn't be possible. ## Set up AgentQL with Zapier 1. Log in to your Zapier account or create a new account. 2. Navigate to AgentQL Integrations in Zapier and click the "Connect AgentQL to 7,000+ apps" button, which redirects you to a new Zap page. 3. In the Setup step under Account, click "Connect AgentQL" and provide your AgentQL API key. Collect your AgentQL API key here (https://dev.agentql.com/api-keys). Once that's done you can start creating an automation! Use a pre-made Zap or create your own with the Zap Editor. ## Usage Each Zap has one app as the **Trigger**, where your information comes from and which causes one or more **Actions** in other apps, where your data gets sent automatically. Use AgentQL as an Action in Zapier to return structured data from web pages for use in other zaps. ### Support If you have any questions about using AgentQL with Zapier, you can open a ticket with Zapier Support. ## Examples # Examples Source: https://docs.agentql.com/examples ## Overview Browser our opensource collection of useful example scripts and get started extracting data, scraping, and automating. ## Examples - Python Examples (/examples/python-examples) - JavaScript Examples (/examples/javascript-examples) ## Related Content ## Python Examples Source: https://docs.agentql.com/examples/python-examples ## JavaScript Examples Source: https://docs.agentql.com/examples/javascript-examples # Concepts Source: https://docs.agentql.com/concepts AgentQL consists of a parser and its query language that use AI-powered natural language selectors for web scraping and automation. This section provides a conceptural overview of the main components and how they work together. ## Overview AgentQL consists of a parser and its query language that use AI-powered natural language selectors for web scraping and automation. This section provides a conceptual overview of the main components and how they work together. ## Conceptual Overviews ## Related content ## A query language for the web Source: https://docs.agentql.com/concepts/query-language AgentQL is an AI-powered query language and parser that use natural language selectors for web scraping and automation. It offers resilient, self-healing, cross-site compatible queries, and structured data output, so you can write your script once and execute anywhere. AgentQL is an AI-powered query language and parser that use natural language selectors for web scraping and automation. It offers resilient, self-healing, cross-site compatible queries, and structured data output, so you can write your script once and execute anywhere. ## Fetching data for web scraping You can use this example query can use to locate products, their names, descriptions, and prices on any eCommerce page: ```AgentQL { products[] { name description price(integer) } } ``` You could think up additional fields like "color" or "size"—whatever you add, AgentQL will find. Pass this query to `query_data` (/api-references/agentql-page#querydata) to return a JSON object containing the data you requested: ```python filename="example_script.py" page.query_data(""" { products[] { name description price(integer) } } """ ) ``` Try it out with any shopfront in our playground (https://playground.agentql.com/). Learn more about how to use AgentQL for data extraction in our scraping docs (/scraping). ## Fetching elements for workflow automation You can use this example to locate a search box and its button on a web page. ```AgentQL { search_input_field search_button } ``` Using our SDK, you can pass an AgentQL query to `query_elements` (/api-references/agentql-page#queryelements) to return Playwright locators and perform actions on them through the browser: ```python filename="example_script.py" page.query_elements( { search_input_field search_button } ) ``` Or return a single element using a natural language prompt describing the element you're looking for with AgentQL's `get_by_prompt` (/api-references/agentql-page#getbyprompt) method: ```python filename="example_script.py" page.get_by_prompt("the search input field") ``` Learn more about how to use AgentQL for workflow automation in our automation docs (/automation). ## Advantages over traditional parsers With AgentQL, developers can write concise, reusable queries that work across multiple sites with similar data structures. Its AI-powered selectors adapt to UI changes, significantly reducing maintenance overhead. For example, a single AgentQL query could extract listing data from both Redfin and Zillow, and continue to function even after site updates, A/B tests, and even redesigns. AgentQL pinpoints the data necessary, removing the need to build custom LLM pipelines to sift through HTML soup. This approach allows developers to focus on data utilization rather than constantly updating scraping logic, leading to more robust and efficient data pipelines. * **Works on any page**—public or private, any site, any URL, even behind authentication * **Self-healing**—in the face of dynamic content and changing page structures, AgentQL still returns the same results * **Reusable code**—the same query works for scraping across multiple similar pages * **Structured format you define**—shape the JSON response with your query AgentQL lets you spend less time writing lines of code and updating broken selectors and more time extracting data and building automations. ## Features ### Semantic Selectors AgentQL uses AI to build a semantic understanding of the context surrounding the web elements on a page. Elements are found based on their meaning and context, not just their position in the DOM, making them more: * **Stable** even when website layouts change over time. * **Reusable** across sites, standardizing outputs. * **Intuitive** for both developers and non-technical users to write ### Natural Language Queries With AgentQL, you describe what you're looking for in plain English and can even pass entire prompts to explain in great detail what you’re looking for on a page. This makes queries more readable across teams and time and more maintainable. ### Controlled Output Your query defines the structure of your output data, eliminating post-processing steps. ### Deterministic Results AgentQL not only allows you to define an exact response structure, but also provides consistent and reliable results. You will get the same output for the same input, every time. This lets you confidently automate processes and tests. ## Primary Use Cases AgentQL is useful for data scraping and extraction, web automation, and testing. ### Data Scraping AgentQL can extract structured data from websites: * Gather pricing details from multiple storefronts * Collect social media metrics * Aggregate news and articles from multiple sources ### Web Automation Streamline repetitive tasks and complex workflows: * Automate form submissions * Interact with web sites programmatically * Create powerful web-based bots and agents ### End-to-End Testing Build more robust and maintainable test suites: * Write tests that are resistant to UI changes * Reduce test flakiness and maintenance overhead ## Conclusion AgentQL can change how you interact with web content. If you’d like to dive in, check these out: ## Main Concepts Source: https://docs.agentql.com/concepts/main-concepts AgentQL is an AI-powered query language and a suite of tools designed to change how developers retrieve elements and data from web sites. It uses a natural language query along with analysis of the page to precisely and efficiently return web elements and data. It consists of the following main components: - **AgentQL Query Language** (#agentql-query-language) simplifies the process of locating web elements with plain language. - **AgentQL Python and JavaScript SDKs** (#agentql-sdks) integrate AgentQL queries into automation scripts. - **AgentQL Debugger (Chrome Extension)** (#agentql-debugger) enables interactive testing of AgentQL queries in real time. ## Core Components ### AgentQL Query Language The AgentQL Query Language is the heart of the system. It uses a natural language-like syntax that allows developers to describe web elements and data in an intuitive, human-readable way: ```AgentQL { search_input } ``` In this query, `search_input` is the element AgentQL locates. You could return a different element by defining a term of your own, like `first_name_input`, or `search_button` or `search_btn`. This approach abstracts away the complexity of DOM traversal. It's flexible and adaptable to changes in webpage structure and resilient to website updates that would break traditional selectors like XPath or DOM or CSS selectors. It can return both elements for interaction and data for extraction. ### AgentQL SDKs The AgentQL Python and JavaScript SDKs are the primary interface for integrating AgentQL into automation scripts, web scrapers, and testing frameworks like Playwright. Key features include: * Simple API for executing queries and handling responses * Support for both synchronous and asynchronous operations * First class integration with popular web automation tool Playwright * Comprehensive error handling and debugging capabilities ### AgentQL Debugger Using the AgentQL Debugger Chrome Extension, developers can efficiently iterate on their queries, ensuring they work correctly before integrating them into larger automation scripts. It lets you write queries and see the elements AgentQL finds in real-time. * Write and test queries in real-time against live webpages * Visualize matched elements directly on the page * Iterate quickly on query refinements * Understand how AgentQL interprets different page structures Please refer to the AgentQL Debugger (/debugger/installation) documentation for detailed instructions on how to install and use the extension. ## Methods of interaction AgentQL provides 2 ways of interacting with web content: ### Element Queries AgentQL’s `query_elements()` (/python-sdk/api-references/agentql-page\#queryelements) method returns web elements that can be manipulated programmatically. It's useful for: AgentQL’s `queryElements()` (/javascript-sdk/api-references/agentql-page\#queryelements) method returns web elements that can be manipulated programmatically. It's useful for: * Interacting with page's elements and navigation * Extracting properties of specific elements AgentQL’s `get_by_prompt()` (/python-sdk/api-references/agentql-page\#getbyprompt) method returns a single web element using a natural language prompt (as opposed to a full query). AgentQL’s `getByPrompt()` (/javascript-sdk/api-references/agentql-page\#getbyprompt) method returns a single web element using a natural language prompt (as opposed to a full query). ### Data Queries AgentQL’s `query_data()` (/python-sdk/api-references/agentql-page\#querydata) method returns structured information from web pages. It's useful for: AgentQL’s `queryData()` (/javascript-sdk/api-references/agentql-page\#querydata) method returns structured information from web pages. It's useful for: * Scraping product information, prices, or reviews * Collecting article content or metadata * Extracting tabular data from complex layouts ## Conclusion Understanding these main concepts—AgentQL Query, AgentQL Debugger, and AgentQL SDK—is crucial for leveraging the full power of AgentQL in your web automation tasks. Each component plays a distinct role in simplifying and enhancing the development process, making AgentQL an indispensable tool for modern web automation. ## Under the Hood Source: https://docs.agentql.com/concepts/under-the-hood AgentQL is a powerful query language designed to make web automation and scraping more intuitive by allowing users to interact with web elements through natural language queries. But what happens between running your query and the web page? This document explains the technology that powers AgentQL. ## Key Use Cases There are two main use cases for AgentQL: * **Data extraction** (Scraping): Extracting data from web pages using natural language instructions. * **Web elements lookup**: Leveraging natural language queries to locate web page elements (e.g., buttons, forms). Can be very useful for web automation and E2E testing suites Each use case has its own set of challenges and requirements, which AgentQL's underlying technology addresses. ## Input sources Currently AgentQL works with the following input sources: * The page’s **HTML** provides the structural layout of the web page as well as the actual web page content. The additional context of the page’s hierarchy helps match the content to the query. * The page’s **accessibility tree** (https://developer.mozilla.org/en-US/docs/Glossary/Accessibility\_tree) provides a semantic understanding of the page, closer to how a human would use the page. This aids in the identification of elements based on their roles and labels. AgentQL uses both a page’s HTML structure and the accessibility tree to understand its content and the relationships between its elements. These inputs provide a comprehensive view of the web page, allowing AgentQL to accurately interpret and respond to user queries. Even though AgentQL currently doesn't support other input sources, we are actively working on expanding its capabilities to include additional data inputs (e.g., visual data). ## Working with the input ### Input pre-processing The first step in processing web content is simplification. This involves removing all unnecessary noise and complexity from the input, such as HTML metadata, scripts, hierarchy layers, etc. This allows to create a clean and concise representation of the web page. ### Pipeline Selection Based on Use Case AgentQL dynamically selects the appropriate processing pipeline based on the user's target use case: scraping vs automation. Each pipeline is fine-tuned to handle the specific challenges associated with its respective task, delivering the best results accordingly.. #### Data scraping pipeline * Optimized for locating actual data on the web page. * Prioritizes accuracy and completeness of the extracted data (sacrificing speed if necessary). #### Web automation pipeline * Optimized for locating interactive elements on the web page (e.g., buttons, forms, etc) * Focuses on reliability and execution speed. * Assumes 1-to-1 mapping between AgentQL query terms and the web elements returned. ## Leveraging Large Language Models (LLMs) AgentQL utilizes several public LLMs, including GPT-4, Llama, and Gemini as well as our proprietary model, to generate initial results. AgentQL infrastructure decides which specific LLM to use depending on the complexity of the task and the specific requirements of the use case. ### LLM Selection Criteria * **Use Case**: Different LLMs are better suited for different tasks, such as complex scraping versus straightforward web elements targeting. * **Complexity**: More complex queries may require more advanced models. * **Performance**: Models are chosen based on their performance and suitability for the task at hand. The selected LLM generates an initial result that is contextually relevant and aligned with the user's intent. ### Grounding and Validation To ensure the accuracy and reliability of the output, the initial result generated by the LLM undergoes a rigorous grounding and validation process. #### Grounding The result is cross-referenced with the original input and context to ensure alignment. #### Validation The output is validated against technical requirements, such as correct element selection and accurate data extraction. ## Conclusion AgentQL's ability to process natural language queries and deliver accurate results is the result of a sophisticated process that combines input simplification, task-specific pipelines, advanced LLMs, and thorough validation. # Guides Source: https://docs.agentql.com/guides Learn how to use AgentQL to automate interactions with web pages as well as parse, extract, and scrape data from web pages. ## Overview This section primarily includes guides for scraping (/scraping) as well as automation (automation). Additionally, if covers how to avoid bot detection (/avoiding-bot-detection), improve speed (/speed), and increase accuracy (/accuracy) when using AgentQL. Lastly, for those unfamiliar with Playwright, we have sections on using Playwright's browser with AgentQL (/browser), logging into sites (/logging-into-sites), and navigating pagination (/navigating-pagination). ## Guides ### Web Scraping and Data Extraction (/scraping) ### Automation with `query_element` and `get_by_prompt` (/automation) ### Avoiding bot detection (/avoiding-bot-detection) ### Improving speed (/speed) ### Accuracy (/accuracy) ### Using the browser with AgentQL (/browser) ### Logging into sites (/logging-into-sites) ### Navigating pagination (/navigating-pagination) ## Related content ## Web Scraping and Data Extraction Source: https://docs.agentql.com/scraping You can use AgentQL's SDKs and REST API endpoint to retrieve data from any web page. ## Overview You can use AgentQL's SDKs and REST API endpoint to retrieve data from any web page. ## Guides ## Related content ## Automation with `query_element` and `get_by_prompt` Source: https://docs.agentql.com/automation AgentQL's SDK is integrated with Playwright, which allows you to programmatically interact with web sites. This section shows how to use the `query_element` and `get_by_prompt` functions to return elements, fill out forms, and close cookie dialogs. ## Overview AgentQL's SDK is integrated with Playwright, which allows you to programmatically interact with web sites. This section shows how to use the `query_element` and `get_by_prompt` functions to return elements, fill out forms, and close cookie dialogs. ## Guides ## Related content ## Avoiding bot detection Source: https://docs.agentql.com/avoiding-bot-detection Techniques to avoid being detected by anti-bot systems while using AgentQL to automate workflows. ## Overview Many websites have anti-bot systems in place to prevent automated access. These systems can detect and block bots by analyzing the bot's behavior, such as the speed of requests, the user agent, and the IP address. To avoid detection by these systems, you can use various techniques to make your bot appear more like a human user. ## Guides ## Sample code ## Improving speed Source: https://docs.agentql.com/speed How to improve parsing speed when using AgentQL to query elements and data. ## Overview AgentQL is "fast by default." However, there are some techniques you can use to go even faster. ## Guides ## Related content ## Accuracy Source: https://docs.agentql.com/accuracy How to improve parsing accuracy when using AgentQL to query elements and data. ## Overview AgentQL was built to be the most accurate parser on the market. However, there are some cases where you may need to improve the accuracy of your query. ## Guides ## Related content ## Using the browser with AgentQL Source: https://docs.agentql.com/browser ## Overview AgentQL's SDKs use Playwright under the hood to fetch data and elements from web pages and interact with the page in a natural manner. To learn more about Playwright, visit its official documentation. ## Guides ## Related content ## Logging into sites Source: https://docs.agentql.com/logging-into-sites ## Overview You can pass and cache user credentials to log into websites with AgentQL. This section shows how to store and use these credentials with AgentQL's automation tooling to access websites. ## Guides ## Related content ## Navigating pagination Source: https://docs.agentql.com/navigating-pagination ## Overview Pagination is a common pattern in websites where content is divided across multiple pages. This section guides you through how you can handle pagination with AgentQL. ## Guides - How to navigate infinite scrolling pages (/navigating-pagination/infinite-scroll) - How to collect data across numerically paginated web pages (/navigating-pagination/collect-data-from-paginated-pages) - How to step through paginated pages (/navigating-pagination/step-through-paginated-pages) ## Related content ## Deploying AgentQL scripts Source: https://docs.agentql.com/deploying How to deploy AgentQL scripts to cloud services. ## Overview After you've written your AgentQL script, you may want to deploy it to cloud services like AWS, GCP, or Azure. ## Guides - How to deploy AgentQL script (/deploying/how-to-deploy-agentql-script) ### Web Scraping and Data Extraction Source: https://docs.agentql.com/scraping You can use AgentQL's SDKs and REST API endpoint to retrieve data from any web page. ## Overview You can use AgentQL's SDKs and REST API endpoint to retrieve data from any web page. ## Guides ## Related content #### Scraping data with AgentQL's REST API Source: https://docs.agentql.com/scraping/scraping-data-api AgentQL’s REST API enables powerful, flexible data retrieval from webpages in a structured format, ready for seamless integration into your workflow. ## Overview This guide shows you how to use the REST API to scrape data from a webpage, customize parameters for enhanced scraping capabilities, and retrieve structured data in JSON format with AgentQL queries. (You can also query data from raw HTML. (/scraping/getting-data-from-html-api)) ## Defining the REST API request structure The following fields outline the high-level structure of a data scraping request: - `url`: The URL of the webpage you want to retrieve data from - `html`: Alternative to `url`, which you can use to query data from an HTML file (/scraping/getting-data-from-html-api). - `query`: An AgentQL query (/agentql-query/query-intro) that defines the data to extract and the format for the retrieved output. - `params`: (Optional) Additional settings for enhanced data retrieval, such as enabling screenshots or scrolling. See the API Reference (/rest-api/api-reference#request-body) for more details about params. ## Constructing the API request To perform a basic data scraping request, start by defining the `url` of the desired webpage and the `query` to specify the data you want to retrieve in the request body. 1. Example REST API Request Below is an example request body structure: ```json filename="request_body" { "url": "https://scrapeme.live/?s=fish&post_type=product", "query": "{ products[] { product_name product_price(integer) } }" } ``` 2. Setting Request Headers Before making the API request, include the necessary headers for authentication and content type. These headers authorize the request and specify the data format to send. - `X-API-Key`: this header should have your AgentQL API key for authentication. - `Content-Type`: set it to `application/json` to indicate that the request body is in JSON format, allowing the server to interpret the data correctly. 3. Making the API Request Using your preferred HTTP client (like curl, Postman, or an HTTP library in Python or your preferred language), you can make a POST request to the AgentQL REST API endpoint. ```bash curl -X POST "https://api.agentql.com/v1/query-data" \ -H "X-API-Key: $AGENTQL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://scrapeme.live/?s=fish&post_type=product", "query": "{ products[] { product_name product_price(integer) } }" }' ``` Make sure to replace `$AGENTQL_API_KEY` with your actual API key. 4. Reviewing the API Response If the request is successful, the API returns a JSON response with the extracted data. **Example Response** ```json filename="response" { "data": { "products": [ { "product_name": "Qwilfish", "product_price": 77 }, { "product_name": "Huntail", "product_price": 52 }, ... ] }, "metadata": { "request_id": "ecab9d2c-0212-4b70-a5bc-0c821fb30ae3" } } ``` You can read more about the response structure and metadata fields in the API Reference (/rest-api/api-reference#response). ## Debugging with Screenshots If you are not receiving the expected data, you can use screenshots to validate that the page is in expected state by setting the `is_screenshot_enabled` parameter to `true` in the request body. ```bash curl -X POST "https://api.agentql.com/v1/query-data" \ -H "X-API-Key: $AGENTQL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://scrapeme.live/?s=fish&post_type=product", "query": "{ products[] { product_name product_price(integer) } }", "params": { "is_screenshot_enabled": true } }' ``` With screenshots enabled, the API will return a Base64 encoded string in the `screenshot` field of the response. This will allow you to see the page content that was scraped. ```json filename="response" { "data": { "products": [ { "product_name": "Qwilfish", "product_price": 77 }, { "product_name": "Huntail", "product_price": 52 }, ... ] }, "metadata": { "request_id": "ecab9d2c-0212-4b70-a5bc-0c821fb30ae3", "screenshot": "iVBORw0KGgoAAAANSUhEUgAABQAAAALQCAIAAABAH0o..." } } ``` You can convert the Base64 string returned in the `screenshot` field to an image and view it using free online tools like Base64.guru (https://base64.guru/converter/decode/image). Here's the screenshot returned in the above response: To get more familiar with the AgentQL's REST API and other `params` options, check out the API Reference (/rest-api/api-reference). #### Querying data from HTML with AgentQL's REST API Source: https://docs.agentql.com/scraping/getting-data-from-html-api AgentQL’s REST API enables powerful, flexible data retrieval from HTML files in a structured format, ready for seamless integration into your workflow. ## Overview This guide shows you how to use the REST API to extract data from raw HTML and retrieve structured data in JSON format with AgentQL queries. ## Defining the REST API request structure The following fields outline the high-level structure of a data querying request: - `html`: The raw HTML to query data from. - `query`: The AgentQL query to execute. Learn more about how to write an AgentQL query in the docs (https://docs.agentql.com/agentql-query). - `params`: (Optional) Additional settings for enhanced data retrieval, such as enabling screenshots or scrolling. See the API Reference (/rest-api/api-reference#request-body) for more details about params. The REST API also accepts `url`, which you can use to scrape a live web page (/scraping/scraping-data-api). ## Constructing the API request To perform a basic data scraping request, start by defining the `url` of the desired webpage and the `query` to specify the data you want to retrieve in the request body. 1. Example REST API Request Below is an example request body structure: ```json filename="request_body" { "query": "{ page_title }", "html": "\n\n\n Simple Web Page\n\n\n Main Page\n\n" } ``` 2. Setting Request Headers Before making the API request, include the necessary headers for authentication and content type. These headers authorize the request and specify the data format to send. - `X-API-Key`: this header should have your AgentQL API key for authentication. - `Content-Type`: set it to `application/json` to indicate that the request body is in JSON format, allowing the server to interpret the data correctly. 3. Making the API Request Using your preferred HTTP client (like curl, Postman, or an HTTP library in Python or your preferred language), you can make a POST request to the AgentQL REST API endpoint. ```bash curl -X POST https://api.agentql.com/v1/query-data \ -H "X-API-Key: $AGENTQL_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "{ page_title }", "html": "\n\n\n Simple Web Page\n\n\n Main Page\n\n", }' ``` Make sure to replace `$AGENTQL_API_KEY` with your actual API key. 4. Reviewing the API Response If the request is successful, the API will return a JSON response with the extracted data. **Example Response** ```json filename="response" { "page_title": "Simple Web Page" } ``` You can read more about the response structure and metadata fields in the API Reference (/rest-api/api-reference#response). To get more familiar with the AgentQL's REST API and other `params` options, check out the API Reference (/rest-api/api-reference). #### Scraping data with `query_data` Source: https://docs.agentql.com/scraping/scraping-data-sdk Learn how to use the `query_data` method to scrape and extract structured data from a website using AgentQL's SDKs. Use the `query_data` method to extract structured data from a web page, such as product details, user reviews, or other information. Unlike `query_elements` (/automation/elements) or `get_by_prompt` (/automation/single-element), `query_data` doesn't return interactive elements but data. ## Overview This guide shows you how to use `query_data` and work with the data output. ## Define the data query First, define an AgentQL query that describes how to structure the data. For example, the following query scrapes a website for the `name` and `price` for all products within a product category. ```AgentQL { product_category product[] { name price } } ``` ## Run the data query Within your script, you can now pass your query into the `query_data` method. ```python filename="example.py" products_response = page.query_data(PRODUCTS_QUERY) ``` ## Understanding the data output When you run the query, it returns a dictionary containing the retrieved data formatted according to the query schema. Here's an example of what the query might return: ```json title="products_response" { 'product_category': "Coffee Beans", 'product': [ { 'name': 'Starbucks Coffee Beans' 'price': '$16.99' } { 'name': 'Blue Bottle Coffee Beans' 'price': '$17.99' } ] } ``` ## Accessing the data output Finally, you can access any part of the data according to the schema in your script as you would any standard dictionary. The following snippet includes some common examples using the scenario from this guide: ```python filename="example.py" # Access the product category category = products_response['product_category'] print(f"Product Category: {category}") # Access the list of products products = products_response['product'] # Iterate through the products and print their details for product in products: name = product['name'] price = product['price'] print(f"Product: {name}, Price: {price}") ``` ## Conclusion Remember that the `query_data` method is ideal for scraping and retrieving data while `query_elements` is ideal for interacting with the elements. ## Related content #### Extracting data from PDFs and images Source: https://docs.agentql.com/scraping/pdfs-images-data-extraction Learn how to use AgentQL to extract structured data from PDFs and image files AgentQL supports extracting data from PDFs and image files.. ## Test this feature in Playground! 1. Go to AgentQL's Playground. 2. Click the "Documented (Experimental)" toggle. 3. Either click "Choose file" to upload your PDF, JPG, or PNG file _or_ drag and drop your file into the target preview area. 4. Add an AgentQL query (/agentql-query) to the query box (or use the "Suggest a Query" button to have AgentQL craft a query for you). 5. Click the "Fetch Data" button. Check the results box for your extracted data, and please let the AgentQL team know your feedback. If you would like to access the feature via the SDK, please reach out to join Tiny Fish's Beta Access Program. #### Scheduling scraping jobs Source: https://docs.agentql.com/scraping/scheduling AgentQL's Dev Portal enables you to schedule multiple scraping workflows with multiple scraping jobs on different websites with AgentQL queries. Because this feature is still experimental, we're limiting the number of scraping jobs you can schedule to: * 2 workflows per user * 5 urls per workflow * 10 runs per workflow If you need more data extraction jobs, please reach out to us. ## Overview This guide shows you how to use the Dev Portal to create a scraping workflow to scrape Hackernews and Product Hunt discussions to get the latest product launches. ## Creating a scraping workflow 1. On the Dev Portal, navigate to the scheduling page. 2. Select the **Add New Workflow** button. 3. Add a name for your workflow—for example, "Startups News." 4. Add the URL(s) for the pages that you'd like to extract data from—for example "https://news.ycombinator.com/" to scrape Hackernews and/or "https://www.producthunt.com/discussions" to scrape new product launches on Product Hunt. 5. Add an AgentQL query (/agentql-query), for example this one which will fetch the title, URL, and date posted of each post on the page: https://www.producthunt.com/discussions ```agentql { posts[] { title url date_posted } } ``` 6. Select a time to run the query. You may customize the schedule to run at a different time of day, week, or month. 7. Toggle on **Save screenshot** to save a screenshot of the webpage at the the time of the job. This can be useful to understand the context of the job and debug data extraction issues (is there a login screen or a popup in the way). 8. Use the **Submit** button to create the workflow. ## Editing and inspecting scraping workflows You can inspect a workflow by visiting the scheduling page. Here, you can access each workflow and see the AgentQL query used to scrape the data, the status of the scraping job, the scraped data by selecting, and the screenshot of the webpage at the point of scraping. If you don't see any workflows, you may need to create one first. ### Pause a scraping workflow On the scheduling page, select a workflow you want to pause, and use the **Pause** button on the top right to pause the workflow. ### Edit a scraping workflow To change a workflows AgentQL query, the list of URLs to scrape, and/or its schedule: 1. Go to the scheduling page. 2. Select a workflow you want to edit. 3. Use the **Edit** to open the workflow. 4. Make the necessary changes to the workflow. 5. Use the **Update** button to save the changes. ### Delete a scraping workflow On the scheduling page, select a workflow you want to delete, and use the **Delete** button on the top right to delete the workflow. Confirm the deletion by selecting **Delete** again. ## Run a scraping job manually On the scheduling page, select a workflow you want to run, and use the **Run Now** button to run the workflow immediately. ## Export scraped data to JSON On the scheduling page, select a workflow you want to export data from: 1. Select the checkboxes of the jobs you wish to export. Each URL has a separate job. 2. Select **Export jobs** on the top left of the list of jobs. 3. Select the checkboxes of the fields you wish to export. 4. Use the **Export** button to download a JSON file containing the scraped data. ### Automation with `query_element` and `get_by_prompt` Source: https://docs.agentql.com/automation AgentQL's SDK is integrated with Playwright, which allows you to programmatically interact with web sites. This section shows how to use the `query_element` and `get_by_prompt` functions to return elements, fill out forms, and close cookie dialogs. ## Overview AgentQL's SDK is integrated with Playwright, which allows you to programmatically interact with web sites. This section shows how to use the `query_element` and `get_by_prompt` functions to return elements, fill out forms, and close cookie dialogs. ## Guides ## Related content #### Return a collection of elements with `query_elements` Source: https://docs.agentql.com/automation/elements How to use AgentQL's `query_elements` method to locate more than one element from a web page and perform automations with them. You can use the `query_elements` method to locate more than one element from a web page. You can use these elements to automate workflows and interact with web sites by simulating clicking on buttons, filling out form fields, and scrolling. Unlike `query_data`(scraping), `query_elements` doesn't return data but one or more interactive Playwright locators (https://playwright.dev/python/docs/api/class-locator). ## Overview This guide shows you how to return one or more elements with `query_elements` and how to use it in your scripts to interact with web elements. ## Define a query The first step is to define a query for all the desired elements you want AgentQL to return. In this example, the goal is to interact with the `add_to_cart_button`. ```AgentQL { add_to_cart_button } ``` ## Fetch an element with `query_elements` Next, pass the query to the `query_elements` method to fetch the desired elements. ```python filename="example.py" response = page.query_elements(QUERY) ``` ## Access an element returned by `query_elements` `query_elements` returns a Python object you can use to interact with the elements on the page. You can access the elements as if they were the fields of this response object. ```python filename="example.py" response.add_to_cart_button.click() ``` ## Fetch multiple elements with `query_elements` This query returns all the products on the page and their "add to cart" buttons. ```AgentQL { products[] { add_to_cart_button } } ``` ### Extract a single element from many ```python filename="example.py" # Click first product's button on the page response.products[0].add_to_cart_button.click() # Click last product's button on the page response.products[-1].add_to_cart_button.click() ``` ## Conclusion `query_elements` is the proper method to use when you want to interact with multiple elements on a page. If you only need to a single element, use the `get_by_prompt` (single-element) method instead. ## Related content #### Return a single element with `get_by_prompt` Source: https://docs.agentql.com/automation/single-element How to use AgentQL's `get_by_prompt` method to locate one or more elements from a web page by passing a prompt describing the element you're looking for. You can use the `get_by_prompt` method to locate one or more elements from a web page by passing context, a prompt describing the element you're looking for. You can use this element to automate workflows and interact with web sites by simulating clicking on buttons, filling out form fields, and scrolling. Unlike `query_data`(scraping), `get_by_prompt` doesn't return data but one or more interactive Playwright locators (https://playwright.dev/python/docs/api/class-locator). ## Overview This guide shows you how to return an element with `get_by_prompt` and how to use it in your scripts to interact with web elements. ## Pass a prompt to `get_by_prompt` ```python filename="example.py" search_bar = page.get_by_prompt("the search bar") ``` ## Access an element returned by `get_by_prompt` `get_by_prompt` returns a Python object you can use to interact with the element on the page. You can access the element as if they were the fields of this response object. ```python filename="example.py" search_bar.fill("AgentQL") ``` ## Conclusion `get_by_prompt` is the proper method to use when you want to interact with a single element on a page. If you need to access multiple elements or want to make one call to return multiple locators, use the `query_elements` (elements) method instead. ## Related content #### How to fill out and submit a form Source: https://docs.agentql.com/automation/submit-form Submitting web forms is a common scenario in automation since it allows you to efficiently transfer data and run multiple scenarios. ## Overview In this guide, you will how to use an Playwright methods in a AgentQL script to automate submitting a form. ## Common scenarios Here are some common scenarios for filling out forms with Playwright and AgentQL. ### Fill out an input field !Text Input Example (/images/docs/text-input.png) Use Playwright's `fill()` (https://playwright.dev/python/docs/api/class-locator#locator-fill) to fill an text input field. It accepts the desired value as an argument. `fill()` is the recommended method for filling out forms, but some forms fire events on keypress. In that case, use `press_sequentially()` (https://playwright.dev/python/docs/api/class-locator#locator-press-sequentially). Don't use `type()`—it's deprecated. ```python filename="fill-input-text.py" await response.first_name.fill("John") await response.date_of_birth.press_sequentially("2010-10-10") ``` ### Select an option from a dropdown or selection box !Select Option (/images/docs/select-option.png) Use Playwright's `selection_option()` (https://playwright.dev/python/docs/api/class-locator#locator-select-option) method to select an option from a dropdown or select box. It accepts an option (or set of options for multiselect). See Playwright's docs for more. (https://playwright.dev/python/docs/api/class-locator#locator-select-option) ```python filename="select-option.py" await response.school.select_option(label="Washington University in St.Louis") ``` ### Upload a file !Upload File (/images/docs/file-upload.png) Use Playwright's `FileChooster` (https://playwright.dev/docs/api/class-filechooser) to upload a file to a form. ```AgentQL { resume_attach_btn } ``` ```python file_path = "/path/to/your-file.pdf" response = await page.query_elements(query) async with page.expect_file_chooser() as fc_info: await response.resume_attach_btn.click() file_chooser = await fc_info.value await file_chooser.set_files(file_path) ``` ## Conclusion Automating form submission allows you to unlock powerful workflows with AgentQL. Here is a complete script that include examples of filling out the form and selecting options using this test form website (https://formsmarts.com/html-form-example). ```python filename="example-script.py" import asyncio import agentql from playwright.async_api import async_playwright # URL of the e-commerce website # You can replace it with any other e-commerce website but the queries should be updated accordingly URL = "https://formsmarts.com/html-form-example" async def main(): """Main function.""" async with async_playwright() as playwright, await playwright.chromium.launch( headless=False ) as browser: # Create a new page in the browser and wrap it to get access to the AgentQL's querying API page = await agentql.wrap_async(browser.new_page()) await page.goto(URL) # open the target URL form_query = """ { first_name last_name email subject_of_inquiry inquiry_text_box submit_btn } """ response = await page.query_elements(form_query) await response.first_name.fill("John") await response.last_name.fill("Doe") await response.email.fill("johndoe@agentql.com") await response.subject_of_inquiry.select_option(label="Sales Inquiry") await response.inquiry_text_box.fill("I want to learn more about AgentQL") # Submit the form await response.submit_btn.click() # confirm form confirm_query = """ { confirmation_btn } """ response = await page.query_elements(confirm_query) await response.confirmation_btn.click() await page.wait_for_page_ready_state() await page.wait_for_timeout(3000) # wait for 3 seconds print("Form submitted successfully!") if __name__ == "__main__": asyncio.run(main()) ``` ## Related content #### How to close a modal or cookie dialog Source: https://docs.agentql.com/automation/close-modals While automating web interactions, it's common for a modal or cookie dialog to interrupt your script. Fortunately, you can close them with a script of your own. ## Overview This guide shows you how to close modals and cookie dialogs using AgentQL. ## Locate the reject button The key to closing a modal or cookie dialog is to identify the close/reject button associated with it. You can use the `query_elements()` method to locate the button like so: ```python filename="close_modals.py" QUERY = """ { cookies_form { reject_btn } } """ response = page.query_elements(QUERY) ``` ## Click the reject button Once you have located the reject button, you can click it to close the modal or cookie dialog. ```python filename="close_modals.py" response.cookies_form.reject_btn.click() ``` ## Put it all together If you want to take a look at how everything comes together in a single script, it's available in our Github examples repo here (https://github.com/tinyfish-io/agentql/tree/main/examples/python/close_cookie_dialog) #### How to solve Playwright timeout errors when interacting with elements Source: https://docs.agentql.com/automation/actionability-check ## What's actionability check? By default, Playwright automatically waits for an element to be actionable before performing the action. If the element stays un-actionable for too long, Playwright gives a timeout error. Playwright performs this actionability check before each action (clicking, hovering, typing) to ensure that the target elements are actionable. These checks verify that the elements are: - Stable: Not in animation. - Visible: Has non-empty bounding box and doesn't have `visibility:hidden` computed style. - Enabled: Not with a `disabled` property. - Receives Events: Not covered by other elements. ## When to turn off Playwright's actionability check In most cases, you shouldn't turn off Playwright's actionability check because it makes your scripts less reliable. However, there are some cases you might want to turn it off: - You intentionally need to interact with an element that's normally hidden or partially covered. - You want to interact with elements that are in constant animation. - You need to click or type on elements that are partially off-screen or behind overlays. ## How to turn off actionability check You can bypass actionability checks and force Playwright to perform the action immediately by setting the `force` parameter to `True` with Playwright actions, such as `click` or `hover`. For instance: You can bypass actionability checks and force Playwright to perform the action immediately by setting the `force` option to `true` with Playwright actions, such as `click` or `hover`. For instance: ```python filename="example.py" hidden_element.click(force=True) ``` ```js filename="example.js" await hiddenElement.click({ force: true }); ``` ### Avoiding bot detection Source: https://docs.agentql.com/avoiding-bot-detection Techniques to avoid being detected by anti-bot systems while using AgentQL to automate workflows. ## Overview Many websites have anti-bot systems in place to prevent automated access. These systems can detect and block bots by analyzing the bot's behavior, such as the speed of requests, the user agent, and the IP address. To avoid detection by these systems, you can use various techniques to make your bot appear more like a human user. ## Guides ## Sample code #### Enabling proxies Source: https://docs.agentql.com/avoiding-bot-detection/enable-proxies ## Overview This guide shows you how to set up and use proxies with Playwright and AgentQL to avoid bot detection. ## What's a proxy A proxy is a server that acts as an intermediary between your browser and the website you are trying to access. You can use different types of proxies such as residential proxies, data center proxies, and mobile proxies. ## When should you use a proxy There are many scenarios where you might want to use a proxy: - Bypass geo-restrictions - Access blocked websites in your country - Hide your IP address - Bypass rate limits - Bypass bot detection ## How to use a proxy with Playwright You can set the proxy configuration with Playwright. You can use a HTTP(S) or SOCKS5 proxy with Playwright. ```python filename="main.py" import agentql from playwright.sync_api import sync_playwright # Replace with your proxy details YOUR_PROXY_SERVER = "http://your-proxy-server:port" YOUR_PROXY_USERNAME = "your-proxy-username" YOUR_PROXY_PASSWORD = "your-proxy-password" with sync_playwright() as playwright: # Launch browser with proxy settings browser = playwright.chromium.launch( proxy={ "server": YOUR_PROXY_SERVER, "username": YOUR_PROXY_USERNAME, "password": YOUR_PROXY_PASSWORD, } ) # Wrap browser with AgentQL page = agentql.wrap(browser.new_page()) page.goto("https://example.com") # Output the page title print(page.title()) # Add your own code here ``` You'll need to replace `YOUR_PROXY_SERVER`, `YOUR_PROXY_USERNAME`, and `YOUR_PROXY_PASSWORD` with your actual proxy details. The `proxy` argument within the `playwright.chromium.launch()` function is responsible for establishing the proxy configuration, which includes the proxy server URL along with the port, and the credentials necessary for authentication. To learn more about proxies with Playwright, see Playwright Proxy (https://playwright.dev/docs/network#http-proxy). ## Conclusion In this guide, you learned how to set up and use proxies with Playwright while integrating it with AgentQL. Using proxies is a common way to avoid bot detection. ## Related Content #### Rotating browser headers Source: https://docs.agentql.com/avoiding-bot-detection/rotating-headers ## Overview This guide explains what browser headers are, why they're essential in web scraping or automation, and how to rotate them programmatically using Playwright with AgentQL. ## What are Browser Headers? Browser headers are pieces of information your browser sends to the web server when making a request. They contain important details such as the type of browser, operating system, accepted content types, and more. Some common headers include: - **User-Agent**: Identifies the browser and operating system. - **Referrer**: Indicates the URL of the referring page. - **Accept-Language**: Specifies the language the browser can understand. - **Location**: Specifies the location of the user. - **DNT (Do Not Track)**: Signals whether the user wants to opt-out of tracking. ## Why rotate browser headers? Rotating browser headers is a crucial technique in web scraping and automation to prevent websites from detecting and blocking your requests. Web servers often identify repeated requests with the same headers as suspicious and might: - Block your IP address. - Trigger CAPTCHAs. - Serve incorrect or incomplete data. By rotating headers, you simulate requests from different users, making it harder for the website to detect that your requests are automated. This can help you: - Avoid detection by anti-bot systems. - Bypass rate limits. - Prevent IP bans. ## How to rotate browser headers with Playwright You can programmatically rotate browser headers using Playwright integrated with AgentQL. Below is an example of how to do this: ```python filename="main.py" import asyncio import logging import random import agentql from playwright.async_api import Geolocation, ProxySettings, async_playwright logging.basicConfig(level=logging.DEBUG) log = logging.getLogger(__name__) BROWSER_IGNORED_ARGS = [ "--enable-automation", "--disable-extensions", ] BROWSER_ARGS = [ "--disable-xss-auditor", "--no-sandbox", "--disable-setuid-sandbox", "--disable-blink-features=AutomationControlled", "--disable-features=IsolateOrigins,site-per-process", "--disable-infobars", ] USER_AGENTS = [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4.1 Safari/605.1.15", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:130.0) Gecko/20100101 Firefox/130.0", ] LOCATIONS = [ ("America/New_York", Geolocation(longitude=-74.006, latitude=40.7128)), # New York, NY ("America/Chicago", Geolocation(longitude=-87.6298, latitude=41.8781)), # Chicago, IL ("America/Los_Angeles", Geolocation(longitude=-118.2437, latitude=34.0522)), # Los Angeles, CA ("America/Denver", Geolocation(longitude=-104.9903, latitude=39.7392)), # Denver, CO ("America/Phoenix", Geolocation(longitude=-112.0740, latitude=33.4484)), # Phoenix, AZ ("America/Anchorage", Geolocation(longitude=-149.9003, latitude=61.2181)), # Anchorage, AK ("America/Detroit", Geolocation(longitude=-83.0458, latitude=42.3314)), # Detroit, MI ("America/Indianapolis", Geolocation(longitude=-86.1581, latitude=39.7684)), # Indianapolis, IN ("America/Boise", Geolocation(longitude=-116.2023, latitude=43.6150)), # Boise, ID ("America/Juneau", Geolocation(longitude=-134.4197, latitude=58.3019)), # Juneau, AK ] REFERERS = ["https://www.google.com", "https://www.bing.com", "https://duckduckgo.com"] ACCEPT_LANGUAGES = ["en-US,en;q=0.9", "en-GB,en;q=0.9", "fr-FR,fr;q=0.9"] async def main(): user_agent = random.choice(USER_AGENTS) header_dnt = random.choice(["0", "1"]) location = random.choice(LOCATIONS) referer = random.choice(REFERERS) accept_language = random.choice(ACCEPT_LANGUAGES) async with async_playwright() as playwright, await playwright.chromium.launch( headless=False, args=BROWSER_ARGS, ignore_default_args=BROWSER_IGNORED_ARGS, ) as browser: context = await browser.new_context( locale="en-US,en,ru", timezone_id=location[0], extra_http_headers={ "Accept-Language": accept_language, "Referer": referer, "DNT": header_dnt, "Connection": "keep-alive", "Accept-Encoding": "gzip, deflate, br", }, geolocation=location[1], user_agent=user_agent, permissions=["notifications"], viewport={ "width": 1920 + random.randint(-50, 50), "height": 1080 + random.randint(-50, 50), }, ) page = await agentql.wrap_async(context.new_page()) await page.enable_stealth_mode(nav_user_agent=user_agent) await page.goto("https://bot.sannysoft.com/", referer=referer) await page.wait_for_timeout(30000) if __name__ == "__main__": asyncio.run(main()) ``` It includes code for selecting random values from predefined lists of user agents, locations, referrers, dnt,and accept languages. Viewport is also randomised to make the bot look more like a human. This mimics requests from different users, browsers, and locations, making it harder for the website to detect that your requests are automated. ## Conclusion Rotating browser headers using AgentQL's Playwright integration can help you avoid detection and improve the stability and accuracy of your web scraping or automation tasks. ## Related Content #### Enable "stealth mode" for a headless browser Source: https://docs.agentql.com/avoiding-bot-detection/stealth-mode-for-headless-browser ## Overview This section explains what Stealth Mode is, why you might need it, and how to enable it with AgentQL. ## Why do you need stealth mode? Modern websites often deploy sophisticated bot detection systems that analyze browser behavior, properties, and interactions to distinguish human users from bots. Without Stealth Mode, several signals can reveal the use of automation, such as: * **`navigator.webdriver` property**: Indicates whether a browser is controlled by automation. * **Headless mode detection**: Certain differences in how headless browsers behave compared to regular browsers. * **Missing or inconsistent browser APIs**: Bots often miss certain APIs or provide inconsistent values (for example, WebGL, media codecs). These detection methods can result in websites blocking automated sessions, presenting CAPTCHAs, or even banning IP addresses. AgentQL Stealth Mode helps to bypass such measures by minimizing the traces of automation and simulating real user browser environment. ## Example usage You can enable Stealth Mode in AgentQL by calling the `enable_stealth_mode` (/api-references/agentql-page#enablestealthmode) method of a page object. ```python filename="stealth_mode.py" from playwright.sync_api import sync_playwright import agentql with sync_playwright() as playwright, playwright.chromium.launch( headless=False, ) as browser: page = agentql.wrap(browser.new_page()) page.enable_stealth_mode() # [!code highlight] page.goto("https://bot.sannysoft.com/") page.wait_for_timeout(30000) ``` ```python filename="stealth_mode.py" from playwright.async_api import async_playwright import agentql async with async_playwright() as playwright, await playwright.chromium.launch( headless=False, ) as browser: page = await agentql.wrap_async(browser.new_page()) await page.enable_stealth_mode() # [!code highlight] await page.goto("https://bot.sannysoft.com/") await page.wait_for_timeout(30000) ``` In this example, we enable Stealth Mode to the Playwright page before navigating to the website. This way, the browser will simulate a real user environment, making it harder for the website to detect automation. For more advanced usages, you may want to customize some of the default values used by Stealth Mode. You can do this by passing the desired options to the `enable_stealth_mode` method. For example, you can set the `nav_user_agent` to a specific value or customize `webgl` properties. ```python filename="stealth_mode.py" page.enable_stealth_mode( webgl_vendor="Intel Inc.", webgl_renderer="Intel Iris OpenGL Engine", nav_user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36", ) ``` To get more realistic values, you could pull the user agent and webgl vendor from a real browser and pass them to `enable_stealth_mode` method. To get those values, you can go to one of the browser fingerpriting websites such as bot.sannysoft.com (https://bot.sannysoft.com/) and pixelscan.net (https://pixelscan.net/) get those values there. ## Related content To further lower the risk of being detected by an anti-bot system, check out Stealth Mode examples which demonstrate other techniques to apply in your scripts. #### User-like behavior Source: https://docs.agentql.com/avoiding-bot-detection/user-like-behavior Learn how to make your AgentQL scripts look more human-like and avoid bot detection strategies. ## Overview This guide shares some technniques to make AgentQL scripts appear more human-like and avoid bot detection strategies. ## Randomization Humans have reliably imperfect interactions with user interfaces—they never move their cursors never in straight lines, they click in random places on elements, and they scroll in unpredictable increments down the page. Humans are so good at being less than precise that many anti-bot systems measure mouse movements, click coordinates, and scroll lengths to determine how "bot-like" a session is. The solution is to add some randomization to these interactions. ### Randomize mouse movements Unlike scripted movements and interactions, humans don't move their cursor in a straight line. This is one way reCAPTCHA tells human from bot. Adding some random jitter to your interactions helps them appear more human-like. The following code moves the mouse to a random position on the page at a random interval: ```python filename="main.py" def random_mouse_movement(page: Page): for _ in range(10): page.mouse.move(random.randint(0, 1000), random.randint(0, 1000)) time.sleep(random.uniform(0.1, 0.5)) ``` ```javascript filename="main.js" async function randomMouseMovement(page) { for (let i = 0; i setTimeout(r, Math.random() * 400 + 100)); } } ``` As with random movements, humans also don't click in the same place on an element's bounding box. You can randomize your clicks with the following code, which clicks at a random coordinate within an element's bounding box: ```python filename="main.py" def random_click(page: Page, element: ElementHandle): box = element.bounding_box() page.mouse.move(box["x"] + box["width"] / 2, box["y"] + box["height"] / 2) page.mouse.click(box["x"] + box["width"] / 2, box["y"] + box["height"] / 2) ``` ```javascript filename="main.js" async function randomClick(page, element) { const box = await element.boundingBox(); await page.mouse.move(box.x + box.width / 2, box.y + box.height / 2); await page.mouse.click(box.x + box.width / 2, box.y + box.height / 2); } ``` Lastly, humans scroll inconsistently. The following code randomizes the amount the page scrolls down: ```python filename="main.py" def random_scroll(page: Page): page.mouse.wheel(0, 1000) time.sleep(random.uniform(0.1, 0.5)) ``` ```javascript filename="main.js" async function randomScroll(page) { await page.mouse.wheel(0, 1000); await new Promise((r) => setTimeout(r, Math.random() * 400 + 100)); } ``` ## Keypresses Another things humans do but automation software like Playwright doesn't: pressing individual keys, which fires events in the browser that antibot software can listen to. Fortunately, Playwright has a solution for this in `press_sequentially`. This method focuses the element and then sends a `keydown`, `keypress`/input, and `keyup` event for each character in the text. Here's how it looks: ```python filename="main.py" page.get_by_prompt("the search bar").press_sequentially("AgentQL") ``` ```javascript filename="main.js" const searchBar = await page.getByPrompt('the search bar'); searchBar.pressSequentially('AgentQL'); ``` ## Example With these randomized actions, your script would look more human-like. For example, you want to scroll down to the bottom of the AgentQL quick start page and click in the "AgentQL query syntax" text button. You can combine the methods you learned to make your script look more human-like. ```python filename="main.py" import random import time from playwright.sync_api import ElementHandle, Page, sync_playwright import agentql def random_mouse_movement(page: Page): for _ in range(10): page.mouse.move(random.randint(0, 1000), random.randint(0, 1000)) time.sleep(random.uniform(0.1, 0.5)) def random_click(page: Page, element: ElementHandle): box = element.bounding_box() page.mouse.move(box["x"] + box["width"] / 2, box["y"] + box["height"] / 2) page.mouse.click(box["x"] + box["width"] / 2, box["y"] + box["height"] / 2) def random_scroll(page: Page): page.mouse.wheel(0, 1000) time.sleep(random.uniform(0.1, 0.5)) with sync_playwright() as playwright: # Launch browser with proxy settings browser = playwright.chromium.launch(headless=False) # Wrap browser with AgentQL page = agentql.wrap(browser.new_page()) page.goto("https://duckduckgo.com/") # Type "AgentQL" into the search box keystroke by keystroke page.get_by_prompt("the search bar").press_sequentially("AgentQL") # Click the search button in a random manner random_click(page, page.get_by_prompt("the search button")) for _ in range(5): random_mouse_movement(page) random_scroll(page) ``` ```javascript filename="main.js" const { wrap, configure } = require('agentql'); const { chromium } = require('playwright'); async function randomMouseMovement(page) { for (let i = 0; i setTimeout(r, Math.random() * 400 + 100)); } } async function randomClick(page, element) { const box = await element.boundingBox(); await page.mouse.move(box.x + box.width / 2, box.y + box.height / 2); await page.mouse.click(box.x + box.width / 2, box.y + box.height / 2); } async function randomScroll(page) { await page.mouse.wheel(0, 1000); await new Promise((r) => setTimeout(r, Math.random() * 400 + 100)); } async function main() { // Configure the AgentQL API key configure({ apiKey: process.env.AGENTQL_API_KEY, // This is the default and can be omitted. }); // Launch browser with proxy settings const browser = await chromium.launch({ headless: false }); // Wrap browser with AgentQL const page = await wrap(await browser.newPage()); await page.goto('https://duckduckgo.com/'); // Type "AgentQL" into the search box keystroke by keystroke const searchBar = await page.getByPrompt('the search bar'); searchBar.pressSequentially('AgentQL'); // Click the search button in a random manner await randomClick(page, await page.getByPrompt('the search button')); for (let i = 0; i ## Conclusion In this guide, you learned how to make your playwright script look more human-like by randomizing your mouse movements, clicks, and scrolling. ## Related Content ### Improving speed Source: https://docs.agentql.com/speed How to improve parsing speed when using AgentQL to query elements and data. ## Overview AgentQL is "fast by default." However, there are some techniques you can use to go even faster. ## Guides ## Related content #### Enable Fast Mode Source: https://docs.agentql.com/speed/fast-mode All of AgentQL's query methods (/api-references/agentql-page) allow you to control how the data is retrieved by changing its operational mode to either 'standard' (/accuracy/standard-mode) or 'fast' response times. Fast Mode is enabled by default. ## Overview This guide will show you how to change the mode for data queries. ## Example of changing to Fast Mode All of AgentQL's query methods allow you to specify a mode for querying, offering a choice between two modes: - **Standard Mode**: This mode ensures a thorough response that may take slightly longer. - **Fast Mode**: This mode provides quicker responses. It sacrifices some depth of analysis for speed, making it useful for cases where performance is a priority. This is the default mode. Here's how you can utilize `query_data` with Fast Mode enabled: ```python filename="fast_mode.py" query = """ { blog_posts { title author } } """ async def query_example(): data = await page.query_data( query=query, mode="fast" # Switch to 'standard' for a more comprehensive response ) print(data) ``` ## Advantages of using Fast Mode Fast Mode is ideal for quick data polling from changing web pages or light-weight, repeated queries on web pages. Fast mode meets the bar for most tasks our users have, but if you need more data depth, Standard Mode is an option for complex extractions. ## Conclusion AgentQL's Fast Mode is a powerful tool for quick data extraction. It's a great choice for web pages or when speed is a priority. If you need more detailed data, you can switch to Standard /mode for a more comprehensive response. ## Related ### Accuracy Source: https://docs.agentql.com/accuracy How to improve parsing accuracy when using AgentQL to query elements and data. ## Overview AgentQL was built to be the most accurate parser on the market. However, there are some cases where you may need to improve the accuracy of your query. ## Guides ## Related content #### Passing context to queries with prompts Source: https://docs.agentql.com/accuracy/contextual-queries ## Overview AgentQL supports contextual queries, enhancing the precision of your results. Context allows you to specify additional details about the data you want to extract in plain English. This guide demonstrates how to incorporate context into your queries and leverage it for improved accuracy. ## Adding context to queries Add context to queries in parentheses `()` after the term. For example, if you want to scrape all the products on the page, but exclude sponsored results, you can add the context `(exclude sponsored results)` to the `products` term. ### Example: Excluding sponsored results **With Context:** ```AgentQL { products(exclude sponsored results)[] } ``` **Without Context:** ```AgentQL { products[] } ``` ## Using context to select the correct element Context can be particularly useful when you need to select the correct element from multiple similar elements on a page. Here is a real world example (https://www.reddit.com/user/Legitimate-Adagio662/comments/1fyl1bn/scraping_help/) where context distinguishes between different URLs to scrape in a Reddit post. ```AgentQL { url(page URL, not the link inside the post) post { username post_title upvotes number_comments } } ``` In this example, the context "page URL, not the link inside the post" helps AgentQL focus on the correct browser URL instead of the link inside the post. ## Select specific HTML properties You can also add context to select specific HTML properties. Occasionally AgentQL may return the wrong element. In this case, you can add context to ensure the specific HTML properties you want. Here are some examples: **Without Context:** ```AgentQL { products[] } ``` **With Context:** ```AgentQL { products(must be a span tag)[] } ``` **With Specific Class Context:** ```AgentQL { products(must be a span tag with class="product-name")[] } ``` ## Conclusion By incorporating contextual information into your queries, AgentQL offers a powerful way to refine and enhance your data extraction process. This approach not only improves accuracy but also provides flexibility in handling complex web structures and specific data requirements. As you become more familiar with contextual queries, you'll find them invaluable for tackling a wide range of web scraping challenges efficiently. ## Related Content #### Single out elements by describing their surroundings Source: https://docs.agentql.com/accuracy/describing-context AgentQL allows you to pinpoint specific elements on a website by describing their context. You can do this by using descriptions in parentheses. This is a powerful tool to single out elements on a webpage when there are multiple elements with similar content or attributes. ## Overview This section shows different techniques you can use to help identify specific elements on a webpage. ## Using their position on the page Use the element's position to describe its context effectively. For example, specify that an element appears in a particular section of the page or in relation to nearby elements. ```python filename="example_script.py" QUERY = """ { sign_in_btn(This is located in the header) } """ ``` !LinkedIn Sign In Button (/images/docs/describing-context-p2.png) ## Using element's content or attributes Describe elements based on their content or attributes. For example, if you are trying to single out the buy button for Macbook Air product in a page with multiple buy buttons, you can describe it as the button that's associated with the Macbook Air product. ```python filename="example_script.py" QUERY = """ { buy_btn(The button to buy Macbook Air) } """ ``` !Apple Page Buy Button (/images/docs/describing-context-p1.png) ## Best practices for singling out elements - Be specific and unambiguous in your descriptions - Use multiple context clues when necessary - Ensure uniqueness of described context ## Conclusion When there are many similar items on the page and you want just one, you can use the element's position, contents, and/or attributes ti distinguish it from the surrounding content. #### Get the highest resolution image Source: https://docs.agentql.com/accuracy/highest-resolution-image ## Overview Many modern websites which display images include an optimization where, depending on the client's rendering configuration, the client selects an image to display from multiple potential image candidates. This is usually implemented as a `srcset`. When using AgentQL to fetch image URLs on a page, you may want to select which image to use, whether it's the best resolution or smallest size. ## AgentQL Context Through AgentQL Context (/accuracy/contextual-queries), you can pass additional details about how to disambiguate between multiple candidate elements. By leveraging the context, you are able to hint for what particular candidate image to select from a set. ### Example On this website (https://webkit.org/demos/srcset/), if you run the following query: ```AgentQL { image_url } ``` You get back `{"image_url": "https://webkit.org/demos/srcset/image-src.png"}`. However, inspecting the source, you see that this element can actually select from a number of different resolution images, depending on different properties from the client. If you want to select for the best quality image available, you can do this by leveraging the context as well, such as: ```AgentQL { image_url(for the largest available image) } ``` This now hints to AgentQL that, though there are multiple correct image URLs, you are looking for the largest image, and you correctly get back `{"image_url": "https://webkit.org/demos/srcset/image-4x.png"}`! ## Related Content #### Type hinting for query terms Source: https://docs.agentql.com/accuracy/type-hinting ## Overview When querying for certain information on a page, the desired response type of a particular query term may be ambiguous, with multiple acceptable types depending on the expected usage or processing of the AgentQL result. This may be desirable to both control for return types of values which are present on the page, but also indicate to AgentQL what the desired output should be if the requested data isn't located on the page. ## AgentQL Context Through AgentQL Context (/accuracy/contextual-queries), you can pass additional details to hint the desired response type for a term, which results in a more stable query response in the correct format. ### Example Imagine a given page which lists various businesses, along with the business name, star rating, and address: ```text [Business A] 3.5 stars 123 Example Ln. [Business B] No rating 456 Documentation St. [Business C] 5 stars 789 Great Business Ct. ``` If you run a general AgentQL query such as the following: ```AgentQL { businesses[] { business_name star_rating address } } ``` The `star_rating` field for Business B, on any given run, could return `null`, `No rating`, or `0.0`, all of which AgentQL could interpret as correct. Similarly, for Business A, AgentQL could consider a rating of `3.5` or `3.5 Stars` as correct. An example of how you might hint to AgentQL the desired output format and null handling case would be the following: ```AgentQL { businesses[] { business_name star_rating (as a float, or null if not present) address } } ``` By providing this context, it indicates that the desired format for processing results here would be to provide the `star_rating` value as a float, if present on the page, and `null` otherwise, leaving no room for ambiguity on what the acceptable response should look like. ## Related Content #### Enable Standard Mode Source: https://docs.agentql.com/accuracy/standard-mode All of AgentQL's query methods (/api-references/agentql-page) allow you to control how the data is retrieved by changing its operational mode to either 'standard' (/accuracy/standard-mode) or 'fast' response times. Fast Mode is enabled by default, but you can switch to Standard Mode for a more comprehensive response. ## Overview This guide will show you how to change the mode for data queries. ## Example of changing to Standard Mode All of AgentQL's query methods allow you to specify a mode for querying, offering a choice between two modes: - **Standard Mode**: This mode ensures a thorough response that may take slightly longer. - **Fast Mode**: This mode provides quicker responses. It sacrifices some depth of analysis for speed, making it useful for cases where performance is a priority. This is the default mode. Here's how you can utilize `query_data` with Standard Mode enabled: ```python filename="fast_mode.py" query = """ { blog_posts { title author } } """ async def query_example(): data = await page.query_data( query=query, mode="standard" # Switch to 'fast' for more fundamental data extraction ) print(data) ``` ## Advantages of using Standard Mode Standard Mode is ideal when accuracy and detailed extraction are critical or when interacting with complex web sites. Fast Mode is beneficial when you need a quick response and can tolerate some loss in data depth and works for most usecases. ## Conclusion AgentQL's Standard Mode is a powerful tool for complex data extraction. It's a great choice for complex web pages or when data depth is a priority. For most usecases, however, Fast Mode is sufficient. ## Related ### Using the browser with AgentQL Source: https://docs.agentql.com/browser ## Overview AgentQL's SDKs use Playwright under the hood to fetch data and elements from web pages and interact with the page in a natural manner. To learn more about Playwright, visit its official documentation. ## Guides ## Related content #### How to open a headless browser Source: https://docs.agentql.com/browser/headless-browser Headless browsers are powerful tools for web automation, testing, and scraping. They let you run a browser without the need to spin up a visual interface. This can allow scripts to execute faster or in the background. AgentQL's SDKs leverage headless browsers to interact with web pages and execute queries without the need for a visible user interface. ## Overview This guide shows you how and when to use a headless browser, and how to execute queries inside one with AgentQL. ## Why use a headless browser? Headless browsers offer several advantages: - **Speed**: They're faster than full browsers as they don't render visuals. - **Resource efficiency**: They use less memory and CPU. - **Automation**: Perfect for running tests or scripts without manual intervention. - **Server-side operation**: Can run on machines without a GUI. ## AgentQL and Playwright AgentQL's SDK uses Playwright, a powerful browser automation library, to handle headless browsing. Playwright supports multiple browser engines and provides a rich API for web automation. ## Running a query in a headless browser Here's a basic example of how to use AgentQL with a headless browser: ```python import agentql from playwright.sync_api import sync_playwright # Initialise the browser with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = agentql.wrap(browser.new_page()) page.goto("https://scrapeme.live/shop/") SEARCH_QUERY = """ { products[] { product_name product_price(integer) } } """ response = page.query_data(SEARCH_QUERY) print("RESPONSE:", response) # Close the browser browser.close() ``` This script does the following: 1. Initializes a headless browser 2. Navigates to a webpage 3. Creates an AgentQL query 4. Executes the query and retrieves the result 5. Closes the browser ## Related content #### Connect to an open tab in an existing browser Source: https://docs.agentql.com/browser/access-open-tab This guide demonstrates how to connect to an open browser tab and execute AgentQL queries within it using WebSocket connections. ## Overview This guide demonstrates how to connect to an open browser tab and execute AgentQL queries within it using WebSocket connections. ## Why access an open browser tab? Connecting to open browser tabs offers several benefits: - **Interactive development**: Test queries in real-time while viewing the browser. - **Debugging**: Conveniently inspect and troubleshoot query behavior. - **Session preservation**: Work with existing login states and cookies. - **Manual preparation**: Allows manual setup of complex page states before automation. ## Connecting to an open tab Close Google Chrome if it's open. ```bash /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 ``` ```bash chrome.exe --remote-debugging-port=9222 ``` The following example shows how to connect to an open browser tab. You'll need to replace `WEBSOCKET_URL` with the actual WebSocket URL obtained from the preceding command. ```python filename="example_script.py" import agentql from playwright.sync_api import sync_playwright # Connect to an open browser tab via WebSocket URL WEBSOCKET_URL = "ws://localhost:9222/devtools/browser/..." # Your actual WebSocket URL URL = "https://scrapeme.live/shop" # Execute your query STOCK_QUERY = """ { number_in_stock } """ def interact_with_new_page_in_local_browser(): """This function demonstrates how to open and interact with a new page your local browser.""" with sync_playwright() as p: # Connect to the browser via Chrome DevTools Protocol browser = p.chromium.connect_over_cdp(WEBSOCKET_URL) # Create a new tab in the browser window and wrap it to get access to the AgentQL's querying API page = agentql.wrap(browser.contexts[0].new_page()) page.goto(URL) # Use query_elements() method to locate the search product box from the page response = page.query_elements(SEARCH_QUERY) # Use Playwright's API to fill the search box and press Enter response.search_products_box.type("Charmander") page.keyboard.press("Enter") # Use query_data() method to fetch the stock number from the page response = page.query_data(STOCK_QUERY) print(response) if __name__ == "__main__": interact_with_new_page_in_local_browser() ``` ```js filename="example_script.js" const { chromium } = require('playwright'); const { wrap } = require('agentql'); configure({ apiKey: process.env.AGENTQL_API_KEY }); // Connect to an open browser tab via WebSocket URL const WEBSOCKET_URL = 'ws://localhost:9222/devtools/browser/...'; // Your actual WebSocket URL const URL = 'https://scrapeme.live/shop'; // Execute your query const STOCK_QUERY = ` { number_in_stock } `; async function interactWithNewPageInLocalBrowser() { // Connect to the browser via Chrome DevTools Protocol const browser = await chromium.connectOverCDP(WEBSOCKET_URL); // Create a new tab in the browser window and wrap it to get access to the AgentQL's querying API const context = browser.contexts()[0]; const page = wrap(await context.newPage()); await page.goto(URL); // Use queryElements() method to locate the search product box from the page const response = await page.queryElements(SEARCH_QUERY); // Use Playwright's API to fill the search box and press Enter await response.search_products_box.type('Charmander'); await page.keyboard.press('Enter'); // Use queryData() method to fetch the stock number from the page const stockResponse = await page.queryData(STOCK_QUERY); console.log(stockResponse); await browser.close(); } interactWithNewPageInLocalBrowser().catch(console.error); ``` ## Related content ### Logging into sites Source: https://docs.agentql.com/logging-into-sites ## Overview You can pass and cache user credentials to log into websites with AgentQL. This section shows how to store and use these credentials with AgentQL's automation tooling to access websites. ## Guides ## Related content #### Log into sites Source: https://docs.agentql.com/logging-into-sites/log-into-sites Automating the login process is a crucial step in a workflow since it allows your script to access protected content and perform actions as an authenticated user. ## Overview This guide shows how to log into a website with AgentQL. There are four primary steps to logging into a website: 1. Visit the URL 2. Query the form elements 3. Fill out the required form fields 4. Submit the form ## Visit the page First, you need to navigate to the page with AgentQL and Playwright: ```python filename="cache_user_credentials.py" URL = "https://practicetestautomation.com/practice-test-login/" with sync_playwright() as p, p.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) # Wrapped to access AgentQL's query API page.goto(URL) ``` ## Query the form elements Next, you'll need to use the `query_elements` to find all of the required form elements that you'll need to interact with. ```python filename="cache_user_credentials.py" LOGIN_QUERY = """ { username_field password_field submit_btn } """ response = page.query_elements(LOGIN_QUERY) ``` ## Fill out required form fields Then, you'll utilize Playwright's `fill()` method to fill the required login credentials in the form. ```python filename="cache_user_credentials.py" response.username_field.fill("student") response.password_field.fill("Password123") ``` ## Submit form Finally, submit the login form using the `click()` method. ```python filename="cache_user_credentials.py" # 4. Click the submit button response.submit_btn.click() ``` ## Conclusion Here is a complete script that to perform a login action with AgentQL: ```python filename="cache_user_credentials.py" import agentql from playwright.sync_api import sync_playwright # Set the URL to the desired website URL = "https://practicetestautomation.com/practice-test-login/" LOGIN_QUERY = """ { username_field password_field submit_btn } """ with sync_playwright() as p, p.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) # Wrapped to access AgentQL's query API # 1. Navigate to the URL page.goto(URL) # 2. Get the username and password fields response = page.query_elements(LOGIN_QUERY) # 3. Fill the username and password fields response.username_field.fill("student") response.password_field.fill("Password123") # 4. Click the submit button response.submit_btn.click() # Used only for demo purposes. It allows you to see the effect of the script. page.wait_for_timeout(10000) ``` ```python filename="cache_user_credentials.py" import asyncio import agentql from playwright.async_api import async_playwright # Set the URL to the desired website URL = "https://practicetestautomation.com/practice-test-login/" LOGIN_QUERY = """ { username_field password_field submit_btn } """ async def main(): async with async_playwright() as p, await p.chromium.launch(headless=False) as browser: page = await agentql.wrap_async(browser.new_page()) # Wrapped to access AgentQL's query API # Navigate to the URL await page.goto(URL) # Get the username and password fields response = await page.query_elements(LOGIN_QUERY) # Fill the username and password fields await response.username_field.fill("student") await response.password_field.fill("Password123") # Click the submit button await response.submit_btn.click() # Used only for demo purposes. It allows you to see the effect of the script. await page.wait_for_timeout(10000) asyncio.run(main()) ``` Now that you know how to login to a website with AgentQL, learn how to cache and reuse those credentials in the next guide (caching-user-credentials). ## Related content #### Caching user credentials Source: https://docs.agentql.com/logging-into-sites/caching-user-credentials If you have an automation that requires logging into a site, you can save time by securely caching and passing your credentials. ## Overview This guide shows how to avoid logging into the website multiple times by caching user credentials with AgentQL. ### Cache user credentials After logging into a site (log-into-sites), you need to save all relevant authentication cookies and local storage with the `storage_state()` method. Here's an example of saving the current browser session to a local file called `session.json`. ```python filename="cache_user_credentials.py" browser.contexts[0].storage_state(path="session.json") ``` Here is the complete script that saves your current session with the website into a local file for future use authenticating sessions: ```python filename="cache_user_credentials.py" import agentql from playwright.sync_api import sync_playwright URL = "WEBSITE_URL" EMAIL = "YOUR_EMAIL" PASSWORD = "YOUR_PASSWORD" with sync_playwright() as p, p.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto(URL) log_in_query = """ { sign_in_form { email_input password_input log_in_btn } } """ aql_response = page.query_elements(log_in_query) # Fill the email and password input fields response_credentials.sign_in_form.email_input.fill(EMAIL) response_credentials.sign_in_form.password_input.fill(PASSWORD) response_credentials.sign_in_form.log_in_btn.click() page.wait_for_page_ready_state() # Wait for session to be updated with latest credentials page.wait_for_timeout(5000) # Save the signed-in session browser.contexts[0].storage_state(path="yelp_login.json") # [!code highlight] ``` ```python filename="cache_user_credentials.py" import asyncio import agentql from playwright.async_api import async_playwright URL = "WEBSITE_URL" EMAIL = "YOUR_EMAIL" PASSWORD = "YOUR_PASSWORD" async def main(): async with async_playwright() as p, await p.chromium.launch(headless=False) as browser: page = await agentql.wrap_async(browser.new_page()) await page.goto(URL) log_in_query = """ { sign_in_form { email_input password_input log_in_btn } } """ aql_response = await page.query_elements(log_in_query) # Fill the email and password input fields await response_credentials.sign_in_form.email_input.fill(EMAIL) await response_credentials.sign_in_form.password_input.fill(PASSWORD) await response_credentials.sign_in_form.log_in_btn.click() await page.wait_for_page_ready_state() # Wait for session to be updated with latest credentials await page.wait_for_timeout(5000) # Save the signed-in session browser.contexts[0].storage_state(path="yelp_login.json") # [!code highlight] asyncio.run(main()) ``` ### Load user credentials To load the session that was saved to a local file, start a new browser context with the path of the session file from cache user credentials (#cache-user-credentials) as an argument with the `new_context()` method. ```python filename="load_user_credentials.py" browser.new_context(storage_state="session.json") ``` Here is a complete script that loads the saved session: ```python filename="cache_user_credentials.py" import agentql from playwright.sync_api import sync_playwright with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: # Load the saved signed-in session by creating a new browser context with the saved session context = browser.new_context(storage_state="session.json") # [!code highlight] page = agentql.wrap(context.new_page()) page.goto(URL) page.wait_for_page_ready_state() # Used only for demo purposes. It allows you to see the effect of the script. page.wait_for_timeout(10000) ``` ```python filename="load_user_credentials.py" import asyncio import agentql from playwright.async_api import async_playwright async def main(): async with async_playwright() as p, await p.chromium.launch(headless=False) as browser: # Load the saved signed-in session by creating a new browser context with the saved session context = await browser.new_context(storage_state="session.json") # [!code highlight] page = await agentql.wrap_async(context.new_page()) await page.goto(URL) await page.wait_for_page_ready_state() # Used only for demo purposes. It allows you to see the effect of the script. await page.wait_for_timeout(10000) asyncio.run(main()) ``` ## Conclusion Now that you understand how to save a session and load it in future sessions, you can enhance your workflows by caching and passing authetication credentials. ## Related content ### Navigating pagination Source: https://docs.agentql.com/navigating-pagination ## Overview Pagination is a common pattern in websites where content is divided across multiple pages. This section guides you through how you can handle pagination with AgentQL. ## Guides - How to navigate infinite scrolling pages (/navigating-pagination/infinite-scroll) - How to collect data across numerically paginated web pages (/navigating-pagination/collect-data-from-paginated-pages) - How to step through paginated pages (/navigating-pagination/step-through-paginated-pages) ## Related content #### How to navigate infinite scrolling pages Source: https://docs.agentql.com/navigating-pagination/infinite-scroll Modern websites often have content that's dynamically loaded as you scroll down the page. ## Overview This guide shows you how to handle loading this type of content for most pages, as well as some challenges of loading this type of content. ## Get infinite scroll page in ready state First, start with a script that loads the page and wait for it to be in a ready state. ```python filename="infinite_scroll.py" with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto("https://infinite-scroll.com/demo/full-page/") page.wait_for_page_ready_state() ``` If you run a data query against this page directly: ```python filename="infinite_scroll.py" QUERY = """ { page_title post_headers[] } """ response = page.query_data(QUERY) ``` The query then returns the following: ```json { "page_title": "Full page demo", "post_headers": ["1a - Infinite Scroll full page demo", "1b - RGB Schemes logo in Computer Arts"] } ``` This indicates your browser has only loaded the first page of this site. You'll need to leverage the Playwright SDK's ability to send input events to the browser page to load more content on this page. ## Trigger content load on the page There are a few options for scrolling down the page, but the simplest one is to: 1. Use a key press input (https://playwright.dev/docs/api/class-keyboard#keyboard-press) for `End`, which takes you to the bottom of the page that's currently loaded. 2. Give the content time to load by leveraging `wait_for_page_ready_state()`. ```python filename="infinite_scroll.py" page.keyboard.press("End") page.wait_for_page_ready_state() ``` If you run the same query, you'll receive a different response. ```json { "page_title": "Full page demo", "post_headers": [ "1a - Infinite Scroll full page demo", "1b - RGB Schemes logo in Computer Arts", "2a - RGB Schemes logo", "2b - Masonry gets horizontalOrder", "2c - Every vector 2016" ] } ``` You've successfully loaded one additional "page" of content on this site, but what if you need to load additional "pages" of content? ## Load multiple pages of content with looping In order to load multiple pages of content, you can leverage the pagination logic inside of a loop. The following example shows how you can load the three additional pages of content: ```python filename="infinite_scroll.py" num_extra_pages_to_load = 3 for _ in range(num_extra_pages_to_load): page.keyboard.press("End") page.wait_for_page_ready_state() ``` If you look at the response, you'll see it's much more comprehensive than before. ```json { "page_title": "Infinite Scroll · Full page demo", "post_headers": [ "1a - Infinite Scroll full page demo", "1b - RGB Schemes logo in Computer Arts", "2a - RGB Schemes logo", "2b - Masonry gets horizontalOrder", "2c - Every vector 2016", "3a - Logo Pizza delivered", "3b - Some CodePens", "3c - 365daysofmusic.com", "3d - Holograms", "4a - Huebee: 1-click color picker", "4b - Word is Flickity is good" ] } ``` ## Putting it all together If you want to take a look at the final version of this example, it's available in AgentQL's GitHub examples repo (https://github.com/tinyfish-io/fish-tank). ## Conclusion Pagination on web can be tricky since there are different ways that websites can choose to implement it. As a result, while the `End` key press works on many sites, other sites may require using a combination of Playwright mouse move (https://playwright.dev/docs/api/class-mouse#mouse-move) and mouse wheel (https://playwright.dev/docs/api/class-mouse#mouse-wheel) to emulate hovering over different scrolling containers and scrolling. Here is a basic example of using mousewheel to scroll down the page: ```python filename="infinite_scroll.py" def mouse_wheel_scroll(page: Page): viewport_height, total_height, scroll_height = page.evaluate( "() => [window.innerHeight, document.body.scrollHeight, window.scrollY]" ) while scroll_height As the amount of content on a particular page gets longer, AgentQL queries can slow down significantly, so it's generally a good idea to set a cap on the amount of additional pages to load. The right number here depends on the exact website and data that you're looking for. ## Related content #### How to collect data across numerically paginated web pages Source: https://docs.agentql.com/navigating-pagination/collect-data-from-paginated-pages Some sites split content up across multiple pages. When working with paginated websites that use numerical navigation (listing each page as a number), you can use the `paginate` (/python-sdk/api-references/agentql-tools#paginate) function from the AgentQL SDK to collect data from all pages. The `paginate` function only supports web pages that use numerical pagination or provide links/buttons to navigate to the next page. It doesn't support other forms of pagination, like alphabetically paginated web pages. ## Overview This guide shows how to use the `paginate` function to collect data from all pages of a paginated website and save the data to a JSON file. ## Writing the query For this guide, the goal is to query all post titles from the first 3 pages of hackernews feed page. First, you need to write a query that returns the post titles on each page: ```AgentQL { posts[] { title } } ``` ## Using the pagination function Next, you can use the `paginate` function to automatically scrape through specified number of pages and retrieve the aggregated data. In this example, the `paginate` function takes the following arguments: - `page`: An AgentQL Page object of the webpage you want to scrape. - `query`: An AgentQL query in String format that specifies the data to extract on each page. - `number_of_pages`: Number of pages to paginate over. ```python filename="pagination_function.py" paginated_data = paginate(page, QUERY, 3) ``` Internally, the `paginate` function first attempts to find the operable element to navigate to the next page and clicks it, then uses the provided query to extract the data from the page. The function then repeats this process for the specified number of pages. Finally, here's the complete script to save the paginated data into a JSON file: ```python filename="hackernews_pagination.py" with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto("https://news.ycombinator.com/") QUERY = """ { posts[] { title } } """ paginated_data = paginate(page, QUERY, 3) with open("./hackernews_paginated_data.json", "w") as f: json.dump(paginated_data, f, indent=4) log.debug("Paginated data has been saved to hackernews_paginated_data.json") ``` If you want to take a look at the final version of this example, it's available in AgentQL's GitHub examples repo (https://github.com/tinyfish-io/fish-tank). ## Related content #### How to step through paginated pages Source: https://docs.agentql.com/navigating-pagination/step-through-paginated-pages When working with paginated web pages, you may want to collect data from each page individually and aggregate it yourself. Use the `navigate_to_next_page` (/python-sdk/api-references/paginationinfo#navigatetonextpage) method on the `PaginationInfo` (/python-sdk/api-references/paginationinfo) object returned by the `get_pagination_info` (/python-sdk/api-references/agentql-page#getpaginationinfo) method. ## Overview This guide shows how to use the `navigate_to_next_page` method to step through paginated web pages and collect data till reaching a fixed number of data. ## Writing the query For this guide, the goal is to query the information of the first 50 books showed up on a online bookstore. First, you need to write a query that extracts the book names, prices, and ratings. ```AgentQL { books[] { name price rating } } ``` ## Stepping through paginated pages To acquire the first 50 books, you need to step through each paginated page, collect, and aggregate the data while keeping track of the total count of books collected. Here's how you could step through the pages: ```python filename="step_through_paginated_pages.py" with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser: page = agentql.wrap(browser.new_page()) page.goto("https://books.toscrape.com/") # get the pagination info from the current page pagination_info = page.get_pagination_info() # attempt to navigate to next page if pagination_info.has_next_page: pagination_info.navigate_to_next_page() ``` The `get_pagination_info` method returns a `PaginationInfo` object, which contains the pagination information of the current page. The `PaginationInfo` object has a `has_next_page` property that indicates whether there is a next page. If there is a next page, you can call the `navigate_to_next_page` method to navigate to the next page. Internally, the `get_pagination_info` method attempts to identify the operable element for pagination. The `has_next_page` property returns `True` if it finds a clickable element. `navigate_to_next_page` attempts to click the identified element. ## Create a loop To collect the first 50 books, create a loop that keeps track of the total number of books collected and stops when reaching the target number. ```python filename="step_through_paginated_pages.py" books = [] # Aggregate the first 50 book names, prices and ratings while len(books) 50: books.extend(response["books"][:50 - len(books)]) else: books.extend(response["books"]) # get the pagination info from the current page pagination_info = page.get_pagination_info() # attempt to navigate to next page if pagination_info.has_next_page: pagination_info.navigate_to_next_page() ``` ## Related content ### Deploying AgentQL scripts Source: https://docs.agentql.com/deploying How to deploy AgentQL scripts to cloud services. ## Overview After you've written your AgentQL script, you may want to deploy it to cloud services like AWS, GCP, or Azure. ## Guides - How to deploy AgentQL script (/deploying/how-to-deploy-agentql-script) #### How to deploy AgentQL script Source: https://docs.agentql.com/deploying/how-to-deploy-agentql-script After you've created a working AgentQL script, you may want to deploy it to run on a regular basis. This guide walks you through the process of deploying an AgentQL script to cloud services like Amazon Web Services (https://aws.amazon.com/). If your script only retrieves data from the webpage without any automation logic, you can use the AgentQL Scheduler (https://dev.agentql.com/scheduling) or Rest API (https://docs.agentql.com/rest-api/api-reference) to run your job on a regular basis or on demand. ## Prepare the AgentQL script First, you need to prepare your AgentQL script by taking the following 2 steps. 1. Set `headless` to `True` when launching a Playwright browser instance in your script. 2. Read in the AgentQL API key as an environment variable and use `agentql.configure()` to set up the key. If your script contains the AgentQL API key in plain text, remove it to ensure security. ```python import os agentql.configure(api_key=os.getenv("AGENTQL_API_KEY")) ``` ```js const { configure } = require('agentql'); configure({ apiKey: process.env.AGENTQL_API_KEY }); ``` ## Create a Dockerfile Next, create a Dockerfile that you can use to build a Docker image and deploy it to a cloud service. Below is a basic Dockerfile you can customize to your needs. ```dockerfile filename="Dockerfile" FROM python:3.11-slim-bookworm # Set up the working directory ENV APP_HOME /main_app WORKDIR $APP_HOME # Copy the project files COPY main.py $APP_HOME/ # Install project dependencies RUN pip install agentql RUN pip install playwright && \ playwright install chromium && \ playwright install-deps chromium # Environment variables ENV PYTHONDONTWRITEBYTECODE=1 # Run the script CMD ["python", "main.py"] ``` ```dockerfile filename="Dockerfile" # Use Node.js base image FROM node:20-slim # Set work directory WORKDIR /app # Install project dependencies RUN npm install playwright agentql # Install system dependencies required by Chromium RUN npx playwright install-deps chromium # Install the Chromium browser for Playwright RUN npx playwright install chromium # Copy your main AgentQL script into the container COPY main.js . # Run your script CMD ["node", "main.js"] ``` This Dockerfile assumes that the filename is `main.py` or `main.js`. If your script has a different name, you'll need to adjust the Dockerfile accordingly. The Dockerfile must include `playwright install-deps chromium`. This is necessary to install the Playwright browser dependencies. ## Deploy to cloud service Once you've created a Dockerfile, you can deploy it to a cloud service. Typically, you'll need to: 1. Build the Docker image using the Dockerfile. 2. Push the Docker image to a container registry like Docker Hub (https://hub.docker.com/) or AWS Elastic Container Registry (https://aws.amazon.com/ecr/). 3. Deploy the Docker container to run on a cloud service like AWS EC2 instance (https://aws.amazon.com/ec2/). Be aware of some pitfalls when deploying to cloud services. Below are common issues you may encounter when deploying AgentQL scripts and Playwright browser instances: - Make sure the CPU architecture of your Docker image aligns with the architecture of your cloud service. For example, if you're deploying to an EC2 instance that uses the `arm64` architecture, you'll need to build the Docker image using the `arm64` architecture. - Set your `AGENTQL_API_KEY` as an environment variable in the cloud service. - Playwright browser instances can require significant resources. Monitor memory and CPU usage, and adjust your instance size as necessary. ## Schedule service to run on a regular basis If you want to schedule your service to run on a regular basis or at a specific time, there are several ways to achieve this: 1. Use a scheduler: Many cloud providers offer built-in schedulers (such as AWS EventBridge (https://aws.amazon.com/eventbridge/scheduler/) or GCP Scheduler (https://cloud.google.com/scheduler/docs/overview)) to run tasks at specified intervals. 2. Set up a cron job: You can directly set up a cron job (https://medium.com/@poojakhaire000/scheduling-cron-job-using-linux-command-on-google-cloud-4c28a9c3ebfe) in your cloud instances to run your script at a specific time. ## Set up a REST API endpoint Alternatively, set up a REST API endpoint to run your script. This approach is useful if you want to run your script on demand or integrate it with another service. 1. Create a small web app that listens for incoming HTTP requests. You can use frameworks like FastAPI (https://fastapi.tiangolo.com/) (Python) or Express (https://expressjs.com/) (Node.js). 2. Handle endpoint requests by calling your AgentQL script within the route handler. 3. Modify your Dockerfile accordingly to meet the requirements for web apps, such as exposing the port and installing the necessary dependencies. 4. Trigger the endpoint to run the script from any HTTP client or another service. This can be helpful when you need immediate, ad-hoc runs instead of waiting for scheduled tasks. Make sure to secure your endpoint! Consider adding authentication or only allowing private network access if you only use the endpoint internally. ## Retrieve results After your script finishes running, you’ll likely want to inspect or process the results. The approach you choose depends on your data’s nature and how you plan to use it. Below are a few common ways to retrieve and manage your script’s output: ### Log output Print your results to console to capture them in your container logs (such as AWS CloudWatch (https://aws.amazon.com/cloudwatch/)). This method is useful when you generate reports or CSV files and need to save them long term. ### File storage Write output to the cloud service's File Storage (such as AWS S3 (https://aws.amazon.com/s3/)). This method is useful when you generate reports or CSV files and need to save them long term. ### Databases For structured data, you can insert records into a database (such as PostgreSQL (https://www.postgresql.org/) or MongoDB (https://www.mongodb.com/)). This enables efficient integration with other services. With this approach, you may need to update your script to use a database client, and include necessary database credentials as environment variables in your cloud service. ### Notifications Send an email, Slack message, or other real-time notifications to share run results with your team or set up alerts. To do so, you may need to use a service like AWS SES (https://aws.amazon.com/ses/) or Slack Webhook (https://api.slack.com/messaging/webhooks) in your script. # Tools Source: https://docs.agentql.com/tools AgentQL has a suite of tools for testing and debugging your queries. ## Overview AgentQL has a suite of tools for working with AgentQL's query language that you can use test and debug your queries. ## Tools - AgentQL CLI Command Reference (/cli_reference) for working with the SDKs. - AgentQL Debugger (/debugger-extension) browser extension for optimizing queries in real time on any web page. - Playground (https://playground.agentql.com/) lets you create and test AgentQL queries in your browser for quick and painless data extraction. ## Related content ## AgentQL CLI Command Reference Source: https://docs.agentql.com/cli-reference The AgentQL CLI (Command Line Interface) is a command-line tool designed to assist you in using the AgentQL SDK for both Python and JavaScript. It can help you set up your development environment. # Installation ## Prerequisites Node.js 18 or higher ### Install AgentQL CLI From your terminal, run the following command to install the AgentQL CLI: ```bash npm install -g agentql-cli ``` ## Available Commands ### `agentql` Show a list of available commands along with their description. ### `agentql init` Set up the AgentQL development environment by installing the required dependencies for AgentQL. When using Python, make sure you have activated your virtual environment for the necessary dependencies to be installed in the correct directory. It also provides an option to download an example script into the current directory. ### `agentql new-script` Creates a template script in the current directory. Users can choose between asynchronous and synchronous scripts to download. #### Options | Option | Description | | ------------ | ---------------------------------------------------------------------------------------------------------- | | `-h, --help` | Show all available options. | | `-t, --type` | Specify whether to download synchronous or asynchronous templates. Available options are `sync` or `async` | ## AgentQL Debugger Source: https://docs.agentql.com/debugger-extension AgentQL's Chrome Extension lets you query data and elements from websites directly from your browser. Use it to test and perfect queries before using them with the AgentQL SDK (sdk-installation) to automate tasks, scrape data, or write tests on the web. ## Prerequisites - Your AgentQL API key. ## Installation 1. Install the AgentQL Chrome Extension from the Chrome Web Store. !Add Chrome Extension (/images/docs/chrome-extension-p3.png) 2. Press the **Ctrl+Shift+I** keys or **Cmd+Opt+I** on Mac to bring up Chrome's devtools panel. 3. If you don't see "AgentQL" in the top bar of the devtools panel, click on the overflow menu button (») and select "AgentQL" from the list. The AgentQL tab will only appear when inspecting websites with URLs. !Selecting AgentQL tab (/images/docs/chrome-extension-install.webp) 4. Enter your API key (get one here) if you don't have one). !AgentQL tab (/images/docs/chrome-extension-p1.png) 5. You're ready to query! !Query demo (/images/docs/demo-1.webp) ## Try it out To get a feel for how the AgentQL query language works, paste the following query into the extension: ```AgentQL { tutorial_images[] } ``` #### Fetch Web Element Click "Fetch Web Elements" to locate the precise elements on the webpage. Hover over the returned element to see it highlighted on the page. (Can't see it? Try clicking the eye icon to scroll it into view!) You can also click on the `` to see the element highlighted in the DevTools' "Elements" panel. #### Fetch Data Click "Fetch Data" to retrieve URLs for all images included in this tutorial—except for the logo at the top of the page. AgentQL is clever enough to know that the logo is not part of the tutorial content. What other clever things can you get AgentQL to do? Try "Fetch Web Elements" and "Fetch Data" for the following query to get the next page in the documentation: ```AgentQL { next_page_link } ``` Now learn how to use your query with the AgentQL SDK (sdk-installation). ## Troubleshooting If you cannot find the extension in your DevTools, try the following: - Make sure you're on a website with a URL. - Try reopening your DevTools panel. ## Playground Source: https://playground.agentql.com/ # Support Source: https://docs.agentql.com/support If you encounter any issues or have questions, reach out to the AgentQL community support channel. ## Where to get support ### Primary support channel: Discord The AgentQL Discord server is the quickest way to get support. Here’s how to access it: 1. Join the AgentQL Discord server (https://discord.gg/agentql). 2. Navigate to the **#support** channel. 3. Post your question or describe your issue. AgentQL team members actively monitor this channel and will assist you as quickly as possible. ### Email support If you prefer a more traditional route, you can contact support via email at support@agentql.com. ## Guidelines for efective support requests To get the best possible support as quickly as possible, please follow these guidelines when posting in the **#support** channel: 1. **Be specific**: Describe the issue you're facing or the question you have. Include any error messages you're seeing. 2. **Provide context**: Share relevant details about your environment, such as: - AgentQL version - Python version - Operating system - Browser (if applicable) 3. **Include a minimal reproducible example**: If possible, please provide a small code snippet that demonstrates the issue. This helps others understand and diagnose the problem more quickly. 4. **Share your AgentQL query**: Include the full query in your support request. 5. **Describe expected vs. actual behavior**: Explain what you expected to happen and what actually occurred. 6. **Use code formatting**: When sharing code or error messages, use Discord's code formatting (triple backticks: `/```/`) to make it easier to read. ## Example of a good support request I'm having trouble extracting product prices from an e-commerce site. Here's my query: ```AgentQL { products[] { name price } } ``` I'm using AgentQL version 0.5.0 with Python 3.9 on Windows 10. When I run this query, I get all the product names correctly, but the prices are coming back as None. I expected to see the actual price values. Any ideas what might be causing this? # FAQ Source: https://docs.agentql.com/faq ### **Q.** How do I debug my script? **Answer** In Python SDK, the debug mode is enabled by setting the logging level to `DEBUG` in your script: ```python import logging logging.basicConfig(level=logging.DEBUG) ``` In JavaScript SDK, the debug mode is enabled by adding the `DEBUG=True` flag when running the script: ```bash DEBUG=True node example_script.js ``` Turning on the debug mode will output relevant debug information to the console. ### **Q.** How do I specify the format of data returned by AgentQL? **Answer** Since the release of `0.4.7` version, AgentQL supports providing context to query in the following way: ```AgentQL { products[] { price(Format the output as a numerical value only, no dollar sign) name(Format the output with the following prefix: amazon_) } } ``` As you can see, by providing formatting instructions as a context into the query, you could instruct AgentQL to return the expected data format. ### **Q.** While writing AgentQL Script, how do I ensure the web page is loaded entirely? **Answer** It's currently not possible to programmatically define when a page has finished loading—a script may bring new information on to the page, as in the case of infinite scrolling or lazy loading images. You can use Playwright API (https://playwright.dev/python/docs/api/class-mouse#mouse-wheel) to scroll the page up and down to load the more content. ```python import agentql from playwright.sync_api import sync_playwright QUERY = """ { video_comments[] { user_name rating comment } } """ with sync_playwright() as playwright, playwright.chromium.launch() as browser: page = agentql.wrap(browser.new_page()) page.goto("https://www.youtube.com/watch?v=1ZvYrAaJOH4") for _ in range(3): # Scroll down 3 times, each by 1000 pixels page.mouse.wheel(delta_x=0, delta_y=1000) page.wait_for_page_ready_state() # Give it the opportunity to load more content page.mouse.wheel(delta_x=0, delta_y=-3000) # Scroll up 3000 pixels, back to top response = page.query_data(QUERY) print(f"Video comments: \n{response}") ``` ```js filename="example_script.js" const { wrap } = require("agentql"); const { chromium } = require("playwright"); const QUERY = ` { comments[] } `; async function main() { const browser = await chromium.launch({headless: false}); const page = await wrap(await browser.newPage()); await page.goto("https://www.youtube.com/watch?v=1ZvYrAaJOH4"); for (let i = 0; i ### **Q.** How do I make sure that my script waits enough for all the element to appear on the page? **Answer** AgentQL SDK provides the `wait_for_page_ready_state()` API to determine page readiness. This method waits for the page to be in a ready state and all the necessary network requests to be completed. However, there are some cases where the page might not be ready even after the method returns. This could be due to the dynamic nature of the page. In such cases, you can use the `page.wait_for_timeout` function to add a additional delay to your script. AgentQL SDK provides the `waitForPageReadyState()` API to determine page readiness. This method waits for the page to be in a ready state and all the necessary network requests to be completed. However, there are some cases where the page might not be ready even after the method returns. This could be due to the dynamic nature of the page. In such cases, you can use the `page.waitForTimeout` function to add a additional delay to your script. ```python import agentql from playwright.sync_api import sync_playwright QUERY = """ { related_videos[] { video_name video_rating } } """ with sync_playwright() as playwright, playwright.chromium.launch() as browser: page = agentql.wrap(browser.new_page()) page.goto("https://www.youtube.com/watch?v=1ZvYrAaJOH4") page.wait_for_page_ready_state() # Wait for the page to fully load page.wait_for_timeout(5000) # Give additional time if needed response = page.query_data(QUERY) print(f"Related videos: \n{response}") ``` ```js filename="example_script.js" const { wrap } = require("agentql"); const { chromium } = require("playwright"); const QUERY = ` { related_videos[] { video_name video_rating } } `; async function main() { const browser = await chromium.launch({headless: false}); const page = await wrap(await browser.newPage()); await page.goto("https://www.youtube.com/watch?v=1ZvYrAaJOH4"); await page.waitForPageReadyState(); // Wait for the page to fully load await page.waitForTimeout(5000); // Give additional time if needed const response = await page.queryData(QUERY); console.log(`Related videos: \n${JSON.stringify(response, null, 2)}`); await browser.close(); } main(); ```