Example Script for Collecting YouTube Comment Data
In this guide, you will learn how to use an AgentQL script to navigate to a YouTube video and collect comment data.
Instructions
Step 0: Create a New Python Script
In your project folder, create a new Python script and name it example_script.py
.
Step 1: Import Required Libraries
Import needed functions and classes from playwright
and agentql
libraries and import the logging
library.
Step 0: Create a New JavaScript Script
In your project folder, create a new JavaScript script and name it example_script.js
.
Step 1: Import Required Libraries
Import needed functions and classes from playwright
and agentql
libraries.
logging
provides logging, debugging, and information messages, playwright
provides core browser interaction functionality and agentql
adds the main AgentQL functionality.
playwright
provides core browser interaction functionality and agentql
adds the main AgentQL capabilities.
Step 2: Launch the Browser and Open the Website
const URL = "https://www.youtube.com/";
const browser = await chromium.launch({ headless: false });
const page = await wrap(await browser.newPage());
- Set up logging:
logging.basicConfig(level=logging.DEBUG)
configures the logging to show debug-level messages. - Define
URL
: The URL"https://www.youtube.com/"
is the target website for the script. - Start Playwright instance with
sync_playwright()
. - Launch the browser:
playwright.chromium.launch(headless=False)
- Create a new page in the browser and wrap it to get access to the AgentQL's querying API:
agentql.wrap(browser.new_page())
. - Navigate to the website with
page.goto(URL)
.
- Define
URL
: The URL"https://www.youtube.com/"
is the target website for the script. - Launch the browser:
await chromium.launch({ headless: false })
. - Create a new page in the browser and wrap it to get access to the AgentQL's querying API:
await wrap(await browser.newPage())
. - Navigate to the website with
page.goto(URL)
.
Step 3: Define AgentQL Queries
const SEARCH_QUERY = `
{
search_input
search_btn
}
`;
const VIDEO_QUERY = `
{
videos[] {
video_link
video_title
channel_name
}
}
`;
const VIDEO_CONTROL_QUERY = `
{
expand_description_btn
}
`;
const DESCRIPTION_QUERY = `
{
description_text
}
`;
const COMMENT_QUERY = `
{
comments[] {
channel_name
comment_text
}
}
`;
These queries provide the tool for communication and interaction with the right elements on the website. Ensuring you have functional and reliable queries is paramount!
Use the AgentQL Chrome Extension in parallel with the AgentQL SDK to test different query formats and keywords directly with the webpage!
Step 4: Try & Except Block
try {
// query logic
} catch (e) {
console.error(e);
}
Step 5: Execute Search Query and Interact with Search Elements
const response = await page.queryElements(SEARCH_QUERY);
await response.search_input.fill("machine learning", delay=75);
await response.search_btn.click();
- Search Query: Here we pass the
SEARCH_QUERY
to query specific elements on the page to interact with the search elements on YouTube page. - Type and Click: It types
"machine learning"
into the search input with a delay of 75ms between keystrokes and then clickssearch_btn
More information on the type()
and other available interaction APIs you can find in the official Playwright documentation.
Use the AgentQL to_data()
API when you want to convert the AgentQL Response into dict
of data and work with it rather than treating it as web elements!
Optional Step: Convert AgentQL response to a Python dict
Since the raw response is an AgentQL response object, it is not easy to work with it. You can use the to_data()
API to convert the response to a Python dict
.
More information on the fill()
and other available interaction APIs you can find in the official Playwright documentation.
Use the AgentQL toData()
API when you want to convert the AgentQL Response into map
of data and work with it rather than treating it as web elements!
Optional Step: Convert AgentQL response to a JavaScript Map
The raw response is an AgentQL response object, which can be inconvenient to work with it. We recommend using the toData()
API to convert the response to a JavaScript Map
.
const SAMPLE_DATA_QUERY = `
{
videos[] {
title
date_posted
views
}
}
`;
The to_data()
API converts the AgentQL response to a structured dict
in which it replaces the response nodes with text contents of the nodes.
The toData()
API converts the AgentQL response to a structured Map
in which it replaces the response nodes with text contents of the nodes.
Sample result would be as follows:
{
"videos": [
{
"title": "This is a nice video!",
"date_posted": "1 month ago",
"views": "2.7K Views"
},
{
"title": "This maybe a better video!",
"date_posted": "3 months ago",
"views": "2.7 Million Views"
},
{
"title": "This is best video!",
"date_posted": "1 year ago",
"views": "37.5K Views"
}
]
}
Step 6: Execute Video Query and Interact with Video Elements
const response = await page.queryElements(VIDEO_QUERY);
console.debug(
`Clicking Youtube Video: ${response.videos[0].video_title.textContent}`
);
await response.videos[0].video_link.click(); // click the first youtube video
- Video Query: The script runs the
VIDEO_QUERY
to interact with video elements on the search results page. - Click on Video: It clicks on the first video link in the results.
- Logging: A debug message logs the title of the clicked video.
Step 7: Control Video Playback and Show Description
const response = await page.queryElements(VIDEO_CONTROL_QUERY);
await response.expand_description_btn.click();
- Video Control Query: Run the
VIDEO_CONTROL_QUERY
to interact with video controls.
Step 8: Capture and Log Video Description
Instead of converting the intermediate AgentQL response to dict
via to_data()
API, you can call page.query_data()
method to query for structured data at the first place.
Instead of converting the intermediate AgentQL response to Map
via toData()
API, you can call page.queryData()
method to query for structured data at the first place.
const response_data = await page.queryData(DESCRIPTION_QUERY);
console.debug(
`Captured the following description: \n${response_data["description_text"]}`
);
- Description Query: Executes the
DESCRIPTION_QUERY
. - Logging Description: Logs the captured description of the video.
Step 9: Scroll Down the Page to Load Comments
To load comments on a YouTube video page, we need to scroll down the page a few times.
for (let i = 0; i < 3; i++) {
await page.keyboard.press("PageDown");
await page.waitForPageReadyState();
}
- Press
PageDown
Button: Presses thePageDown
button to scroll down the page. - Wait for Page Ready State: Waits for the comments to load with AgentQL's
wait_for_page_ready_state()
method.
- Press
PageDown
Button: Presses thePageDown
button to scroll down the page. - Wait for Page Ready State: Waits for the comments to load with AgentQL's
waitForPageReadyState()
method.
Step 10: Capture and Log Comments
const response = await page.queryData(COMMENT_QUERY);
console.debug(`Captured ${response.get("comments").length} comments!`);
- Execute Query: Pass the session our
COMMENT_QUERY
to capture comment section data. - Count and Log: Here we simply log the number of comments captured.
Step 11: Stop the Browser
await browser.close();
- Call
browser.close()
to releasing resources and close the browser.
Step 12: Run the Script
Open a terminal in your project's folder and run the script:
python3 example_script.py
node example_script.js
Putting it all together…
Here is the complete script, if you'd like to copy it directly:
const { chromium } = require('playwright');
const { wrap, configure } = require('agentql');
// Configure the AgentQL API key
configure({ apiKey: process.env.AGENTQL_API_KEY });
const URL = 'https://www.youtube.com/';
// AgentQL queries
const SEARCH_QUERY = `
{
search_input
search_btn
}
`;
const VIDEO_QUERY = `
{
videos[] {
video_link
video_title
channel_name
}
}
`;
const VIDEO_CONTROL_QUERY = `
{
play_or_pause_btn
expand_description_btn
}
`;
const DESCRIPTION_QUERY = `
{
description_text
}
`;
const COMMENT_QUERY = `
{
comments[] {
channel_name
comment_text
}
}
`;
async function main() {
const browser = await chromium.launch({ headless: false });
const context = await browser.newContext();
// Wrap the page to get access to the AgentQL's querying API
const page = await wrap(await context.newPage());
await page.goto(URL);
try {
// Search query
let response = await page.queryElements(SEARCH_QUERY);
await response.search_input.type('machine learning', { delay: 75 });
await response.search_btn.click();
// Video query
response = await page.queryElements(VIDEO_QUERY);
const videoTitle = await response.videos[0].video_title.textContent();
console.log(`Clicking YouTube Video: ${videoTitle}`);
await response.videos[0].video_link.click();
// Video control query
response = await page.queryElements(VIDEO_CONTROL_QUERY);
await response.expand_description_btn.click();
// Description query
const responseData = await page.queryData(DESCRIPTION_QUERY);
console.log(`Captured the following description:\n${responseData.description_text}`);
// Scroll down the page to load more comments
for (let i = 0; i < 3; i++) {
await page.keyboard.press('PageDown');
await page.waitForPageReadyState();
}
// Comment query
const commentsResponse = await page.queryData(COMMENT_QUERY);
console.log(`Captured ${commentsResponse.comments.length} comments!`);
} catch (e) {
console.error(`Found Error: ${e}`);
throw e;
}
// Used only for demo purposes. It allows you to see the effect of the script.
await page.waitForTimeout(10000);
await browser.close();
}
main();