Release Notes
Version 1.15.0
New features
- Added
inactivity_timeout_secondsparameter tocreate_browser_session()/createBrowserSession()for Tetra remote browsers to control session inactivity timeout after which remote browser will be automatically terminated.
Improvements
- Updated dependencies to their latest versions.
Version 1.14.1
Fixes
- Fixed a bug where AgentQL returns 500 on some pages with invalid unicode characters.
Version 1.14.0
New features
- 🛡️ Browser profiles for remote browsers -
LIGHTandSTEALTHmodes for enhanced bot detection avoidance!
Learn more at https://docs.agentql.com/browser/remote-browser#browser-profiles
Version 1.13.0
New features
- 🚀 Introducing Tetra Browser - remote Chrome browser designed for AgentQL!
Learn to use it at https://docs.agentql.com/browser/remote-browser
Version 1.12.0
Breaking Changes
- Python SDK: Dropped support for Python 3.8. The minimum required Python version is now 3.9.
Fixes
- Various stability fixes and improvements
Version 1.11.2
Fixes
- Various stability fixes and improvements
Version 1.11.1
Fixes
- Fixed a bug where whitespaces are removed from text elements if there are hyperlinks in the text.
Version 1.11.0
- (JS SDK) Bundle JS scripts, so there are less dependencies on Node.js packages. This should help for cases when AgentQL is used in environments where Node.js is not available (e.g. edge environments).
Version 1.10.1
- Stability fixes and improvements
Version 1.10.0
- Introducing Query Document tool. You can now use AgentQL to extract data from documents!
Learn to use it at https://docs.agentql.com/python-sdk/api-references/agentql-tools#query-document
Version 1.9.2
- Added better error messages when API key is invalid.
Version 1.9.1
- Fixed a bug where accessibility tree generation fails if there are hidden text elements.
Version 1.9.0
New features
- (Python SDK) Added an
experimental_query_elements_enabledargument toquery_elements()andget_by_prompt()to improve accuracy.
Fixes
- (Python SDK) Fixed a bug where iframe accessibility tree could be
Noneon some websites.
Version 1.8.1
Fixes
- Fixed a bug where text elements may lose context during page processing
Version 1.8.0
New features
- Debug information available on
Pageobject.
Users can now access the last query, response, and accessibility tree generated by the AgentQL SDK on this page using getLastQuery(), getLastResponse(), and getLastAccessibilityTree() methods respectively.
Users can now access the last query, response, and accessibility tree generated by the AgentQL SDK on this page using get_last_query(), get_last_response(), and get_last_accessibility_tree() methods respectively.
These information may be useful for debugging and trouble-shooting.
Fixes
- (Python SDK) Fixed a bug when trying to wrap an already wrapped Playwright
Page
Version 1.7.1
Fixes
- Updated endpoint for AgentQL query generation in Python SDK.
Version 1.7.0
New features for Python SDK
- Pagination!
AgentQL Python SDK now supports pagination under agentql.tools module. With paginate(), users can automatically collect data from multiple pages using an AgentQL query. Additionally, user can use get_pagination_info() to step through the pagination process for further manipulation.
For more information, please refer to the API references for paginate() and get_pagination_info().
Tiny Fish is planning to add pagination support to JavaScript SDK soon.
Version 1.6.2
Improvements
- Optimized accessibility tree generation by combining processing steps.
Fixes
- Fixed a bug in accessibility tree generation caused by undefined element tag name.
Version 1.6.1
Fixes
- Fixed a bug in accessibility tree generation affecting specific websites.
Version 1.6.0
Python SDK
Breaking changes
DebugManagerandTrailLoggerare removed from the Python SDK. Now, to debug your scripts, you can set the logging level toDEBUGin your script like this:
import logging
logging.basicConfig(level=logging.DEBUG)Fixes
- Fixed the issue where AgentQL hangs when the page crashes.
Improvements
- Stealth mode library updated to version 1.1.0. Now users can pass
browser_typeparameter to indicate the browser type they are using.
await page.enable_stealth_mode(nav_user_agent=user_agent, browser_type="chrome")JavaScript SDK
Fixes
- Fixed the issue where AgentQL hangs when the page crashes.
- Fixed the issue where AgentQL throw
Unexpected numbererror when generating accessibility tree.
Version 1.5.0
JavaScript SDK
Breaking Changes
-
We have updated the following methods to accept an options object for optional parameters instead of using positional arguments:
getByPrompt(prompt, options)queryElements(query, options)queryData(query, options)waitForPageReadyState(options)
For more information, please refer to the API reference.
Version 1.4.1
New Features
- JavaScript SDK!
AgentQL now supports JavaScript SDK! Check out the installation instructions and our launch week announcement post to learn more or our new JavaScript examples to get started.
Improvements
- default
query_elements()timeout increased to 300 seconds - default
get_data_by_prompt_experimental()timeout increased to 75 seconds
Version 1.4.0
Breaking Changes
- "fast" mode is now the default mode for
query_elements(),query_data(), andget_by_prompt()methods. Users can still use "standard" mode by setting themodeparameter to "standard":
response = page.query_data(QUERY, mode="standard")Fixes
- Fixed the issue where page monitor is not initialized properly when
page.goto()is not called.
Improvements
- Added support for non-ASCII characters in query descriptions.
Version 1.3.0
Breaking Changes
include_aria_hiddenparameter
For query_elements(), query_data() and get_by_prompt() methods, the parameter include_aria_hidden was changed to include_hidden parameter so that users can control whether to include hidden elements when trying to fetch elements or data.
Version 1.2.0
New Features
- Commas are now supported in AgentQL queries. Users can now use commas to separate query terms in the query string. For example, the following query is now valid:
{
first_name, last_name, email
}Improvements
- Improved the reliability of
wait_for_page_ready_state()method by more thoroughly capturing page events.
Fixes
- Fixed
DebugManagernot finalizing the logger and returning all desired logs.
Version 1.1.0
Breaking Changes
- Session-based API is removed from
agentqlpackage. For new Page-based API, users can refer to this guide.
New Features
- Fast Mode
AgentQL now supports Fast Mode for query_elements(), query_data(), and get_by_prompt() methods. Users can specify the mode they would like to use with the mode parameter -- fast mode will decrease the response time but may lower the accuracy of the response.
For API reference, visit this page.
agentql new-scriptcommand
Users can now use agentql new-script command to quickly set up a template script. Currently, users could choose between sync and async scripts.
For API reference, visit this page.
Request IDfor trouble-shooting
If there is a server-side error, AgentQL now returns a Request ID that corresponds to a specific request in AgentQL backend server. This ID will be output to the console at the end of error messages. Including this ID when reaching out for support will greatly increase the speed of assistance.
Improvements
- Improved accessibility tree generation by including child nodes of slot elements in the tree.
Version 1.0.1
Fixes
- Fixed invalid documentation links in error messages.
Version 1.0.0
AgentQL is officially launched with a new API!
Breaking Changes
- Session-based API is deprecated. They will be removed in version
1.1.0. For new Page-based API, users can refer to this guide.
New Features
wrap()andwrap_async()
The agentql module provides the above two utility methods to convert Playwright's Page to AgentQL's Page, which gives access to AgentQL's querying API.
For instructions on how to use them, visit this API reference page.
get_by_prompt()
Other than query_elements() and query_data() methods, AgentQL now provides get_by_prompt() for users to fetch a single element from web page using natural language.
For API reference, visit this page.
Version 0.5.3
Fixes
- Fixed accessibility tree generation creating duplicate IDs for web elements.
- Optimized accessibility tree generation by including nodes with
aria-hidden=trueattribute by default. - Adjusted how API key is checked so that keys set through environment variable will take precedence over those set in config file.
Improvements
- Debug mode now generates
request_idinformation for eachqueryrequest. Users can share this information with Tiny Fish developers when asking for help with a specific query.
Version 0.5.2
New Features
- Query Data
Previously, AgentQL adhered to a one-to-one relationship between query terms and web elements, which sometimes made it difficult to query a block of text or retrieve actual text value from responses. Now, users can achieve these tasks with the newly added session.query_data() method. The following example demonstrates how to use this endpoint:
import agentql
session = agentql.start_session("https://apply.workable.com/pony-dot-ai/j/56A463E1D3/")
QUERY = """
{
required_programming_skills (just the skill name)[]
base_salary_min (without the dollar sign, use _ as separator)
base_salary_max (with dollar sign)
}
"""
response = session.query_data(QUERY)
# Text of the query terms could be directly retrieved in the following way
print(f"Base salary min: {response.base_salary_min}")
print(f"Base salary max: {response.base_salary_max}")
for skill in response.required_programming_skills:
print(f"Required programming skill: {skill}")
session.stop()Fixes
- Improved accessibility tree generation logic by removing HTML elements with the
codetag.
Version 0.5.1
Fixes
- Fixed the script getting stuck in an infinite loop when the starting character and the ending character of the query are the same, but they are not quotation marks
Version 0.5.0
Breaking changes
- Modules structure update. Playwright web drivers are now located in
agentql.ext.playwright.sync_apiandagentql.ext.playwright.async_apifor synchronous and asynchronous versions respectively:
from agentql.ext.playwright.sync_api import PlaywrightWebDriver
from agentql.ext.playwright.async_api import PlaywrightWebDriverNew features
- Debug Mode
Users can now use AgentQL SDK's Debug Mode to debug their scripts. The following example demonstrates how to enable this mode:
from agentql.sync_api import DebugManager
with DebugManager.debug_mode():
your_scriptIt will save meta information (like OS, Python version, AgentQL version), logs, error information, last accessibility tree used, and screenshots of every page queried to the debug folder. The default path is $HOME/.agentql/debug.
- Query Terms' Context
Previously, when describing a term in the query, users would need to do something like this:
{
second_button_from_the_top_next_to_login_button_only_if_hero_image_is_present
}Now, AgentQL Query supports providing context for the query terms. Add it inside parentheses like this:
{
button(This is the second button from the top. It's next to login button and will only appear when hero image is present)
}For more details, please check out our query introduction page.
- Search in Documentation Website
Users can now search for keywords in AgentQL Documentation Website.
Improvements
- Supported iterating over query's collection data items via
forloop - Improved typechecking for AgentQL response
- Added AgentQL config file path to the
agentql initcommand's output
Fixes
- Fixed a crash in
wait_for_page_ready_state()method when it was invoked before page redirection - Fixed
session.current_pagenot updating when opening new tab after clicking on a link
Version 0.4.7
Fixes:
- Refactored part of the internal logic of query syntax
Version 0.4.6
Improvements:
- Improved the error message for a better debugging experience
- Allowed History log to output logging information even when an error is raised
Fixes:
- Fixed scrolling not working on some websites
- Fixed accessibility tree not being captured correctly on some websites
Version 0.4.5
Improvements:
- Added AgentQL CLI which is a tool designed to assist you in using the AgentQL SDK. It can help you set up your development environment.
- Added Trail Logger which can log actions taken by AgentQL SDK and display them at the end of a session. This can be used for debugging your scripts. The Trail Logger can be enabled through
enable_history_logparameter instart_session()method and the logs can be obtained throughsession.get_last_trail(). - Added
Session#last_accessibility_treeproperty to get the last captured accessibility tree. It can be helpful for debugging purposes. - Added
Popup#page_urlproperty to get the URL of the page where the popup occurred. It can be used when analyzing popup on different pages. - Adjusted the error message for
AttributeNotFoundErrorfor better debugging information. - Moved the import path for
ProxySettings,LocatorandPageclass toagentql.ext.playwright.
Fixes:
- Fixed some web pages with empty iframes HTML element crashing the accessibility tree generation logic.
- Fixed
wait_for_page_ready_state()not reliably waiting on some websites.
Version 0.4.4
Fixes:
- Addressed incorrect hidden elements detection logic
Version 0.4.3
Fixes:
- Fixed some web page elements being incorrectly marked as "hidden" and not included in the query result.
Version 0.4.2
Fixes:
- Fixed the page not being closed when the session was closed.
Version 0.4.1
Fixes:
- Fixed SDK crash to enable async SDK usage and multiple sync sessions.
- Fixed a potential resource leak issue during session creation failures.
Version 0.4.0
Breaking changes
- Major modules structure overhaul.
- Playwright Web Driver now starts in "headed" mode by default. To start it in "headless" mode, users need to pass
headless=Trueto thePlaywrightWebDriverconstructor.
Version 0.3.1
Fixes:
- Fixed SDK crash on Python versions < 3.10
Improvements:
- Added
Session#last_queryandSession#last_responsemethods to get the last response and query objects respectively. These can be helpful for debugging purposes.
Version 0.3.0
We've migrated our SDK from webql to agentql to be consistent with our new branding! This release introduces breaking changes. Please refer to "Breaking Changes" section for latest information.
Breaking changes
-
As we have moved our SDK from webql to agentql, our Python library is now called
agentqland you can import the same withimport agentql -
API key setup, instead of
WEBQL_API_KEY, now the users need to setAGENTQL_API_KEY.
We have also updated our docs to reflect those changes! The underlying APIs available and the way they can be leveraged are still the same.
Version 0.2.8
Hotfix release
- Fixed
TypeError: AsyncClient.post() got an unexpected keyword argument 'allow_redirects'
Version 0.2.7
This release introduces some breaking changes. Please refer to "Breaking Changes" section for latest information.
Breaking changes
As we continue drawing a clearer line between Session and WebDriver, we removed several APIs which were previously present in Session class:
# Removed APIs
session.scroll_up()
session.scroll_down()
session.scroll_to_bottom()
session.load_user_session_state()
session.wait_for_page_ready_state()
session.get_user_session_state()
session.save_user_session_state()All these methods are now available in WebDriver class, so you can use them in the following way:
session.driver.scroll_up()
session.driver.scroll_down()
session.driver.scroll_to_bottom()
session.driver.load_user_session_state()
session.driver.wait_for_page_ready_state()
session.driver.get_user_session_state()
session.driver.save_user_session_state()Improvements
- Fixed possible crash in PlaywrightDriver related to unbound variable (#252)
- Allowed http redirects for AgentQL API calls (#257)
- Fixed resource leak: reuse existing browser context for iframes (#259)
- Fixed resource leak: dom update listener is never removed (#258)
- Moved to tf-playwright-stealth (#260)
- Relaxed dependency requirements (#261)
- Added environment variable to control API host (#262)
Version 0.2.6
Improvements
- Optimized the code by making
enable_stealth_mode()method sync in Asynchronous version of SDK.
Version 0.2.5
Highlights
This release introduces public APIs for checking whether web driver is in headless mode and for retrieving web driver instance in Session class. Several bug fixes and code optimization are also included in this release.
New Features
- API to retrieve
web driverinstance fromSession
Users can now interact with the web driver instance directly from Session class in the following way:
# This will scroll to the bottom of the page
session.driver.scroll_to_bottom()
# This will wait for page to enter a stable state
session.driver.wait_for_page_ready_state()- API to retrieve
headlesssetting
Users can now determine whether the browser is started in headless mode by invoking session.driver.is_headless().
Bug Fixes
- Fixed a bug where users can not chain methods for response object.
Version 0.2.4
Highlights
This release introduces Stealth Mode to SDK. Stealth mode will decrease users' possibility of being marked as bot on some websites.
New Features
- Stealth Mode
Users can enable stealth mode by invoking enable_stealth_mode() method in Web Driver class. Users can pass in their User Agent, webgl renderer, and webgl vendor information to maximize the effect of stealth mode.
Users can activate the Stealth Mode like this:
import webql as wql
from webql.sync_api.web import PlaywrightWebDriver
driver = PlaywrightWebDriver(headless=False)
# Enable the stealth mode and set the stealth mode configuration
driver.enable_stealth_mode(
webgl_vendor=VENDOR_INFO,
webgl_renderer=RENDERER_INFO,
nav_user_agent=USER_AGENT_INFO,
)Version 0.2.3
Highlights
This release improves the stability and reliability of SDK by introducing fixes to some known bugs.
Bug Fixes
- Fixed a bug where page interaction sometimes froze in headless mode.
- Fixed a bug for data postprocessing in async environment.
Version 0.2.2
Highlights
This release introduces a new API through which users can retrieve Page object from web driver. In addition, this release also includes several bug fixes and code optimization.
New Features
- New public API for getting
Pageobject from web driver
A public API has been added to Session class for retrieving Page object. With the Page object, users can interact with web pages more freely, such as page refreshing and navigation.
For instance, to refresh the page, users can use the following script:
session = webql.start_session()
# This will reload the current web page
session.current_page.reload()To navigate to a new website, users can use the following script:
session = webql.start_session()
# This will take the page to a new website
session.current_page.goto("new website link")Bug Fixes
- Fixed a bug where None value in response data is not handled properly.
- Fixed a bug where to_data() method is not working properly in the asynchronous environment.
Version 0.2.1
Highlights
This release introduces a new feature where users can retrieve and load browser's authentication session to maintain login state.
New Features
- Get & Set User Authentication Session:
With this release, users can maintain the previous login state by initializing a session with the user authentication state.
To retrieve the authentication state from the current session, users can utilize Session class's get_user_auth_state():
# Prior to this point, the script has already signed into a website
# This will retrieve the auth state for current session
user_auth_state = session.get_user_auth_state()
# The session info can be saved to local file system like this
with open(FILE_PATH, "w") as f:
f.write(json.dumps(user_auth_state))To load the authentication state while initializing the session, users can pass user_auth_state into start_session()'s user_auth_session parameter:
user_auth_session = None
# To load user_auth_session from local file, users can do something like this
with open(FILE_PATH, "r") as f:
user_auth_session = json.loads(f.read())
session = webql.start_session(user_auth_session=user_auth_session)For a more detailed instruction on how to retrieve and load user session, please refer to the following example in our example repository.
Version 0.2.0
This release introduces some breaking changes. Please refer to "Breaking Changes" section for latest information.
Highlights
This release introduces the asynchronous version of the package. Now users can utilize AgentQL in an optimized fashion within their asynchronous environment.
New Features
- Asynchronous Support: With this release, users can start an asynchronous session using the following script:
import webql
async_session = await webql.start_async_session()For a more detailed instruction on how to use async version, please refer to the following example in our example repository.
Breaking Changes
We have introduced some changes to our public API structure. Specifically, users need to choose between synchronous API and asynchronous API when importing web drivers and helper methods.
Now, PlaywrightWebDriver and close_all_popups_handler need to be imported in the following fashion:
- Synchronous
from webql.sync_api.web import PlaywrightWebDriver
from webql.sync_api import close_all_popups_handler- Asynchronous
from webql.async_api.web import PlaywrightWebDriver
from webql.async_api import close_all_popups_handlerThe following way of importing PlaywrightWebDriver and close_all_popups_handler is no longer supported.
The following script is deprecated and no longer supported.
from webql.web import PlaywrightWebDriver
from webql import close_all_popups_handler