Frequently Asked Questions

Q. How do I debug my script?

Ans: Since the release of 0.5.0 version, AgentQL Python SDK has an internal debug mode, which is implemented as a context manager that could be wrapped around your script. Once a crash happens or the script finishes running, the debug mode will save the following contents to the folder path specified during setup process or AGENTQL_DEBUG_PATH environment variable (default path is YOUR_HOME_PATH/.agentql/debug):

  1. Log of each action taken by AgentQL SDK.
  2. Error Information (if the script crashes).
  3. Screenshot of each page on which a query action is performed.
  4. Accessibility Tree of the last page before the crash or the end of script.
  5. Meta information (OS, Python version, AgentQL version)

The following script is an example of how you could use this debug mode in Python:

import logging
import agentql
from agentql.sync_api import DebugManager
from playwright.sync_api import sync_playwright

logging.basicConfig(level=logging.DEBUG)
log = logging.getLogger(__name__)

QUERY = """
{
    search_box
    search_btn
    about_link
}
"""

# The following context manager will enable debug mode for the script.
with DebugManager.debug_mode():
    with sync_playwright() as playwright, playwright.chromium.launch() as browser:
        page = agentql.wrap(browser.new_page())
        page.goto("https://www.google.com")

        log.debug("Analyzing...")
        response = page.query_elements(QUERY)

        log.debug("Inputting text...")
        # Buggy code that will crash the script. When it crashes, the debug manager will save debug files to designated directory (~/.agentql/debug by default).
        response.search.type("tinyfish")

        log.debug('Clicking "Search" button...')
        response.search_btn.click()

In JavaScript SDK, the debug mode is enabled by adding the DEBUG=True flag when running the script:

terminal
DEBUG=True node example_script.js

This will output relevant debug information to the console.

Q. How do I specify the format of data returned by AgentQL?

Ans: Since the release of 0.4.7 version, AgentQL supports providing context to query in the following way:

{
    products[] {
        price(Format the output as a numerical value only, no dollar sign)
        name(Format the output with the following prefix: amazon_)
    }
}

As you can see, by providing formatting instructions as a context into the query, you could instruct AgentQL to return the expected data format.

Q. While writing AgentQL Script, how do I ensure the web page is loaded entirely?

Ans: Loading web page entirely right now is considered as an application logic, as it is difficult to objectively tell what does it mean that entire web page is loaded. As some websites have lazy loading and are infinitely scroablle and so for those kind of web pages what does it mean to load the entire web page.

You can use Playwright API to scroll the page up and down to load the more content.

example_script.js
js
const { wrap } = require("agentql");
const { chromium } = require("playwright");

const QUERY = `
  {
    comments[]
  }
`;

async function main() {
    const browser = await chromium.launch({headless: false});
    const page = await wrap(await browser.newPage());
    await page.goto("https://www.youtube.com/watch?v=1ZvYrAaJOH4");

    for (let i = 0; i < 3; i++) {
        // Scroll down 3 times, each by 1000 pixels
        await page.mouse.wheel(delta_x=0, delta_y=1000);
        await page.waitForPageReadyState();  // Give it the opportunity to load more content
    }

    await page.mouse.wheel(delta_x=0, delta_y=-3000);  // Scroll up 3000 pixels, back to top

    const response = await page.queryData(QUERY);
    console.log(`Video comments: \n${JSON.stringify(response, null, 2)}`);

    await browser.close();
}

main();

Q. How do I make sure that my script waits enough for all the element to appear on the page?

Ans:

AgentQL SDK provides the waitForPageReadyState() API to determine page readiness. This method waits for the page to be in a ready state and all the necessary network requests to be completed. However, there are some cases where the page might not be ready even after the method returns. This could be due to the dynamic nature of the page. In such cases, you can use the page.waitForTimeout function to add a additional delay to your script.

example_script.js
js
const { wrap } = require("agentql");
const { chromium } = require("playwright");

const QUERY = `
{
    related_videos[] {
        video_name
        video_rating
    }
}
`;

async function main() {
    const browser = await chromium.launch({headless: false});
    const page = await wrap(await browser.newPage());
    await page.goto("https://www.youtube.com/watch?v=1ZvYrAaJOH4");

    await page.waitForPageReadyState();  // Wait for the page to fully load

    await page.waitForTimeout(5000);  // Give additional time if needed

    const response = await page.queryData(QUERY);
    console.log(`Related videos: \n${JSON.stringify(response, null, 2)}`);

    await browser.close();
}

main();