Skip to main content

Introducing Playwright Smart Locator

· 4 min read
Pasha Dudka
Software Engineer @ Tiny Fish

Introducing Playwright Smart Locator: Simplifying Web Automation with Natural Language Queries

We are thrilled to announce the release of a new addition to our AgentQL SDK: Playwright Smart Locator. This AI-powered enhancement is designed to simplify how engineers interact with web elements in their Playwright automation scripts. With Smart Locator, you can now query web elements using natural language, making your automation scripts more intuitive and easier to maintain.

info

Note: Playwright Smart Locator is intended to be used as a drop-in replacement for Playwright's existing locator methods. It is available in the latest version of the AgentQL SDK.

The Challenge with Traditional Selectors

In the realm of web automation, precision is key. Traditional methods of locating web elements often rely on intricate and brittle selectors like XPath or CSS selectors. These methods, while powerful, can become cumbersome and error-prone, especially when dealing with complex web pages or dynamic content. A typical selector might look something like this:

search_box = page.locator("xpath=/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/button")

Such selectors are not only difficult to read but also prone to break when the structure of the webpage changes.

Enter Playwright Smart Locator

Our new Playwright Smart Locator addresses these challenges by enabling you to use natural language queries to locate web elements. This AI-driven approach simplifies the process, making your code more readable and resilient. Here’s how you can use it:

search_box = page.get_by_ai("Search input field")
See full code
import time

from playwright.sync_api import sync_playwright

from agentql.ext.playwright.sync_api import Page


with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page: Page = browser.new_page()
page.goto("https://www.google.com/")

search_box = page.get_by_ai("Search input field")
search_btn = page.get_by_ai("Search button which initiaties the search")

if search_box is None or search_btn is None:
raise ValueError("Search box or search button not found")

search_box.type("Tiny Fish")
search_btn.click(force=True)

time.sleep(5)

With Smart Locator, you can describe the element you want to interact with in plain English, and our AI will handle the rest.

Key Benefits

  1. Intuitive and Readable Code: By replacing complex selectors with natural language queries, your automation scripts become more readable and maintainable. This is especially beneficial for teams, making it easier for new members to understand and contribute to the codebase.

  2. Increased Productivity: Spend less time crafting and debugging selectors. With Smart Locator, you can quickly identify web elements using simple descriptions, speeding up your automation development process.

  3. Resilience to Changes: Natural language queries are more resilient to changes in the webpage structure. If a page layout changes, there's a higher chance that your natural language description will still correctly identify the desired element.

  4. Seamless Integration: For engineers with existing Playwright scripts, integrating Smart Locator is a breeze. It’s a drop-in replacement for your current page.get_by* and page.locate* methods, ensuring a smooth transition.

Getting Started

To start using Playwright Smart Locator:

  1. Update your AgentQL SDK to the latest version
  2. [Important] Replace existing Page import with Page from AgentQL SDK: from agentql.ext.playwright.sync_api import Page (or from agentql.ext.playwright.async_api import Page for async scripts)
  3. Replace your existing locator methods with page.get_by_ai and provide a natural language description of the element you wish to interact with. Here’s a quick example:

Before:

search_box = page.locator("xpath=/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/button")

After:

search_box = page.get_by_ai("Search input field")
warning

Please don't forget to import Page from the AgentQL SDK to use Smart Locator! At the time of import, the SDK will automatically add the get_by_ai to Playwright's Page class

Join the Future of Web Automation

We believe that Playwright Smart Locator will transform how you approach web automation, making it more accessible and efficient. We are excited to see how this feature will enhance your automation scripts and overall productivity.

Stay tuned for more updates and features as we continue to innovate and improve the AgentQL SDK. If you have any feedback or questions, feel free to reach out to our support team or join our community forum.

Happy automating!

Thank you for being a part of our journey to make web automation smarter and more efficient.

New Modules Structure in 0.4.0

· 3 min read
Pasha Dudka
Software Engineer @ Tiny Fish

As we start to roll out our SDK to more users, we have received a lot of feedback on the structure of our modules. We have taken this feedback to heart and have made some significant changes to the structure of our modules in the 0.4.0 release. This blog post will go over the changes we have made and how they will affect you.

Why changing?

The main reason for the change is to make the SDK more user-friendly and easier to use. We have received feedback that the current structure is confusing and hard to navigate. There are a lot of user-facing classes one needs to import and they all live in different modules. This makes it hard to know where to look for the functionality you need.

Consider a snippet from one of the real-life Python scripts using the AgentQL SDK:

from agentql.async_api.web import InteractiveItemTypeT, WebDriver
from agentql.async_api.web.playwright_driver import Locator, PlaywrightWebDriver
from agentql.common.errors import AttributeNotFoundError
from agentql.common.syntax.node import ContainerListNode
from agentql.async_api.popup import Popup
from agentql.common.api_constants import GET_WEBQL_ENDPOINT, SERVICE_URL
from agentql.common.syntax.parser import Parser
from agentql.async_api.response_proxy import WQLResponseProxy

Look at all the different modules that need to be imported! Its really hard to know where to look for the functionality you need and what to import.

What is changing?

To prepare for future SDK growth and to make the SDK more user-friendly, we have decided to restructure the modules in the SDK.

Few main goals we wanted to achieve with the new structure:

  • minimize number of modules user needs to import
  • core functionality should be isolated from concrete web driver implementations to acommodate for new implementations of web drivers

With this in mind, here is roughly what new structure looks like:

agentql/                  # Main package. Core AQL Logic
├── __init__.py
├── errors.py
├── utils.py
├── ...

├── sync_api/
│ ├── __init__.py # Sync-specific classes
│ ├── session.py
│ └── web_driver.py

├── async_api/
│ ├── __init__.py # Async-specific classes
│ ├── session.py
│ └── web_driver.py

└── ext/
├── __init__.py # Platofrm-specific extensions
└── playwright/
├── __init__.py
├── playwright_sync.py
└── playwright_async.py

So the code snippet above would look like this with the new structure:

from agentql import AttributeNotFoundError, Parser, ContainerListNode, GET_WEBQL_ENDPOINT, SERVICE_URL

from agentql.async_api import InteractiveItemTypeT, WebDriver, Popup, AQLResponseProxy

from agentql.ext.playwright import PlayrightWebDriverAsync

from agentql.ext.playwright.playwright_driver_async import Locator

We hope that this new structure will make it easier for you to find the functionality you need and make it easier to use the SDK.

Best Practice of Creating Query

· 5 min read
Frank Feng
TinyFish Software Engineer

Introduction

As a core component of AgentQL, AgentQL Query allows users to retrieve the exact web page elements for interaction or data retrival. Designed with flexibility in mind, AgentQL Query is a schema-less language. Query elements are free-form and not strongly typed. However, there are some syntax requirements as well as best practices for creating AgentQL Query. This blog will try to explain them so that you could create better AgentQL Query.

AgentQL Query Syntax

The list below contains all syntax requirements for AgentQL Query:

  1. The query should be enclosed by curly braces.
  2. New element should be on its own line (i.e. one element per line).
  3. Elements should not be separated by any punctuation.
  4. A container element should enclose its children elements with curly braces.
  5. To create a list element, the term needs to be followed by closed brackets '[]' .

Here are some examples queries that follow the syntax requirements:

{
search_box
search_btn
}

This is an example of a single-level query that tries to retrieve two elements (search box and search button) on the web page.

{
header {
sign_in_btn
}
footer {
about_btn
}
}

This is an example of a nested query that tries to get sign in button in the header section and about button in the footer section. In this case, header and footer elements serve as container elements that capture the hierarchical relationship of the desired elements.

{
links[]
}

This is an example of a query with list element. The above query tries to capture all the links on a web page -- AgentQL server will return an array of links for this query.

{
products[] {
price
rating
reviews[]
}
}

This is another example of a query with list element. However, the query is specifying the exact information wanted in every list item. In this case, AgentQL server will return an array. Each array item will contain the price, the rating, and reviews of one product. Note that list element can be nested -- reviews[] element is also a list element and trying to capture all reviews for this product.

AgentQL Query is designed to be flexible, but there are some recommended practices that may improve the response quality from AgentQL server:

  1. Use lower-cased letters for all the terms in query.
  2. Use underscore "_" to separate words within a term.
  3. Append “btn” to the term to indicate the element is a clickable.
  4. Append “box” to the name to indicate the element is inputtable.
  5. When there are multiple layers of elements, the children element increases indentation on the basis of its parent element.

How to Find the Exact Element on the Page

When there are multiple elements with the same or similar names on the web page, AgentQL server may need further hints in AgentQL Query to find the exact element we are looking for. There are several things we could do with AgentQL Query to help with this process.

Use a better name for element

More accurate or specific name is a powerful tool to get better query results. For instance, using “sign_in_with_google_btn” instead of “sign_in” may help with targeting a specific button on a page.

Use hierarchy hints

Providing hierarchy hints can greatly reduce ambiguity. For example, if there are very similar buttons (f.i. “Sign In”) present on the web page, but one of them is positioned in the header and another one is in sign in form, you could try to specify such semantic information, so its more clear which specific button.

Consider the following example web page and queries:

Linkedin WebPage and Queries

Different container elements (header and form) will convey different hierarchical information to AgentQL server and locate different sign-in buttons on this page.

Use surrounding elements hints

Another way to reduce ambiguity is to specify surrounding elements, so its more clear where the element is located. For example, if you are trying to locate “Sign in” button, which is placed between other 2 buttons, it may help to specify those other 2 buttons as well (even if you are not planning to interact with them) to clarify element location.

{
button_a
sign_in_btn
button_b
}

Examples

Here shows some examples of working with real web pages.

Retriving Phone Model Button

Apple Page Model Buttons

In the example here, we are trying to retrieve the buttons to select different models for the new smart phones.

Apple Page Model Selection Section

Here we are trying to get the entire selection section on the page. But the button information will still be preserved in the response and we could retrieve them through parsing.

Retriving Amazon Product Information

Amazon Product Page

Here we are trying to get the price and name of each product on the Amazon page.

Amazon Product Page

If we want all the relevant information of this product, we could simply use the query above.