How AgentQL Works

AgentQL is a powerful query language designed to make web automation and scraping more intuitive by allowing users to interact with web elements through natural language queries. But what happens between running your query and the web page? This document explains the technology that powers AgentQL.

Key Use Cases

There are two main use cases for AgentQL:

  • Data extraction (Scraping): Extracting data from web pages using natural language instructions.
  • Web elements lookup: Leveraging natural language queries to locate web page elements (e.g., buttons, forms). Can be very useful for web automation and E2E testing suites

Each use case has its own set of challenges and requirements, which AgentQL's underlying technology addresses.

Input sources

Currently AgentQL works with the following input sources:

  • The page’s HTML provides the structural layout of the web page as well as the actual web page content. The additional context of the page’s hierarchy helps match the content to the query.
  • The page’s accessibility tree provides a semantic understanding of the page, closer to how a human would use the page. This aids in the identification of elements based on their roles and labels.

AgentQL uses both a page’s HTML structure and the accessibility tree to understand its content and the relationships between its elements. These inputs provide a comprehensive view of the web page, allowing AgentQL to accurately interpret and respond to user queries.

info

Working with the input

Input pre-processing

The first step in processing web content is simplification. This involves removing all unnecessary noise and complexity from the input, such as HTML metadata, scripts, hierarchy layers, etc. This allows to create a clean and concise representation of the web page.

Pipeline Selection Based on Use Case

AgentQL dynamically selects the appropriate processing pipeline based on the user's target use case: scraping vs automation. Each pipeline is fine-tuned to handle the specific challenges associated with its respective task, delivering the best results accordingly..

Data scraping pipeline

  • Optimized for locating actual data on the web page.
  • Prioritizes accuracy and completeness of the extracted data (sacrificing speed if necessary).

Web automation pipeline

  • Optimized for locating interactive elements on the web page (e.g., buttons, forms, etc)
  • Focuses on reliability and execution speed.
  • Assumes 1-to-1 mapping between AgentQL query terms and the web elements returned.

Leveraging Large Language Models (LLMs)

AgentQL utilizes several public LLMs, including GPT-4, Llama, and Gemini as well as our proprietary model, to generate initial results. AgentQL infrastructure decides which specific LLM to use depending on the complexity of the task and the specific requirements of the use case.

LLM Selection Criteria

  • Use Case: Different LLMs are better suited for different tasks, such as complex scraping versus straightforward web elements targeting.
  • Complexity: More complex queries may require more advanced models.
  • Performance: Models are chosen based on their performance and suitability for the task at hand.

The selected LLM generates an initial result that is contextually relevant and aligned with the user's intent.

Grounding and Validation

To ensure the accuracy and reliability of the output, the initial result generated by the LLM undergoes a rigorous grounding and validation process.

Grounding

The result is cross-referenced with the original input and context to ensure alignment.

Validation

The output is validated against technical requirements, such as correct element selection and accurate data extraction.

Conclusion

AgentQL's ability to process natural language queries and deliver accurate results is the result of a sophisticated process that combines input simplification, task-specific pipelines, advanced LLMs, and thorough validation.