Example Script for Collecting YouTube Comment Data

In this guide, you will learn how to use an AgentQL script to navigate to a YouTube video and collect comment data.

Prerequisites

Instructions

note

If you'd like to start with the full example script, you can find it at the end of the tutorial.

Step 0: Create a New JavaScript Script

In your project folder, create a new JavaScript script and name it example_script.js.

Step 1: Import Required Libraries

Import needed functions and classes from playwright and agentql libraries.

example_script.js
js
const { wrap, configure } = require("agentql");
const { chromium } = require("playwright");

playwright provides core browser interaction functionality and agentql adds the main AgentQL capabilities.

Step 2: Launch the Browser and Open the Website

example_script.js
js
const URL = "https://www.youtube.com/";

const browser = await chromium.launch({ headless: false });
const page = await wrap(await browser.newPage());
  • Define URL: The URL "https://www.youtube.com/" is the target website for the script.
  • Launch the browser: await chromium.launch({ headless: false }).
  • Create a new page in the browser and wrap it to get access to the AgentQL's querying API: await wrap(await browser.newPage()).
  • Navigate to the website with page.goto(URL).

Step 3: Define AgentQL Queries

example_script.js
js
const SEARCH_QUERY = `
{
    search_input
    search_btn
}
`;

const VIDEO_QUERY = `
{
    videos[] {
        video_link
        video_title
        channel_name
    }
}
`;

const VIDEO_CONTROL_QUERY = `
{
    expand_description_btn
}
`;

const DESCRIPTION_QUERY = `
{
    description_text
}
`;

const COMMENT_QUERY = `
{
    comments[] {
        channel_name
        comment_text
    }
}
`;

These queries provide the tool for communication and interaction with the right elements on the website. Ensuring you have functional and reliable queries is paramount!

tip

Use the AgentQL Chrome Extension in parallel with the AgentQL SDK to test different query formats and keywords directly with the webpage!

Step 4: Try & Except Block

example_script.js
js
try {

    // query logic

} catch (e) {
    console.error(e);
}

Step 5: Execute Search Query and Interact with Search Elements

example_script.js
js
const response = await page.queryElements(SEARCH_QUERY);
await response.search_input.fill("machine learning", delay=75);
await response.search_btn.click();
  • Search Query: Here we pass the SEARCH_QUERY to query specific elements on the page to interact with the search elements on YouTube page.
  • Type and Click: It types "machine learning" into the search input with a delay of 75ms between keystrokes and then clicks search_btn
tip

More information on the fill() and other available interaction APIs you can find in the official Playwright documentation.

tip

Use the AgentQL toData() API when you want to convert the AgentQL Response into map of data and work with it rather than treating it as web elements!

Optional Step: Convert AgentQL response to a JavaScript Map

The raw response is an AgentQL response object, which can be inconvenient to work with it. We recommend using the toData() API to convert the response to a JavaScript Map.

example_script.js
js
const SAMPLE_DATA_QUERY = `
{
    videos[] {
        title
        date_posted
        views
    }
}
`;

The toData() API converts the AgentQL response to a structured Map in which it replaces the response nodes with text contents of the nodes.

Sample result would be as follows:

Response Data
json
{
  "videos": [
    {
      "title": "This is a nice video!",
      "date_posted": "1 month ago",
      "views": "2.7K Views"
    },
    {
      "title": "This maybe a better video!",
      "date_posted": "3 months ago",
      "views": "2.7 Million Views"
    },
    {
      "title": "This is best video!",
      "date_posted": "1 year ago",
      "views": "37.5K Views"
    }
  ]
}

Step 6: Execute Video Query and Interact with Video Elements

example_script.js
js
const response = await page.queryElements(VIDEO_QUERY);
console.debug(
    `Clicking Youtube Video: ${response.videos[0].video_title.textContent}`
);
await response.videos[0].video_link.click(); // click the first youtube video
  • Video Query: The script runs the VIDEO_QUERY to interact with video elements on the search results page.
  • Click on Video: It clicks on the first video link in the results.
  • Logging: A debug message logs the title of the clicked video.

Step 7: Control Video Playback and Show Description

example_script.js
js
const response = await page.queryElements(VIDEO_CONTROL_QUERY);
await response.expand_description_btn.click();
  • Video Control Query: Run the VIDEO_CONTROL_QUERY to interact with video controls.

Step 8: Capture and Log Video Description

Instead of converting the intermediate AgentQL response to Map via toData() API, you can call page.queryData() method to query for structured data at the first place.

example_script.js
js
const response_data = await page.queryData(DESCRIPTION_QUERY);
console.debug(
    `Captured the following description: \n${response_data["description_text"]}`
);
  • Description Query: Executes the DESCRIPTION_QUERY.
  • Logging Description: Logs the captured description of the video.

Step 9: Scroll Down the Page to Load Comments

To load comments on a YouTube video page, we need to scroll down the page a few times.

example_script.js
js
for (let i = 0; i < 3; i++) {
    await page.keyboard.press("PageDown");
    await page.waitForPageReadyState();
}
  • Press PageDown Button: Presses the PageDown button to scroll down the page.
  • Wait for Page Ready State: Waits for the comments to load with AgentQL's waitForPageReadyState() method.

Step 10: Capture and Log Comments

example_script.js
js
const response = await page.queryData(COMMENT_QUERY);
console.debug(`Captured ${response.get("comments").length} comments!`);
  • Execute Query: Pass the session our COMMENT_QUERY to capture comment section data.
  • Count and Log: Here we simply log the number of comments captured.

Step 11: Stop the Browser

example_script.js
js
await browser.close();
  • Call browser.close() to releasing resources and close the browser.

Step 12: Run the Script

Open a terminal in your project's folder and run the script:

terminal
node example_script.js

Putting it all together…

Here is the complete script, if you'd like to copy it directly:

example_script.js
js
const { chromium } = require('playwright');
const { wrap, configure } = require('agentql');

// Configure the AgentQL API key
configure({ apiKey: process.env.AGENTQL_API_KEY });

const URL = 'https://www.youtube.com/';

// AgentQL queries
const SEARCH_QUERY = `
{
    search_input
    search_btn
}
`;

const VIDEO_QUERY = `
{
    videos[] {
        video_link
        video_title
        channel_name
    }
}
`;

const VIDEO_CONTROL_QUERY = `
{
    play_or_pause_btn
    expand_description_btn
}
`;

const DESCRIPTION_QUERY = `
{
    description_text
}
`;

const COMMENT_QUERY = `
{
    comments[] {
        channel_name
        comment_text
    }
}
`;

async function main() {
    const browser = await chromium.launch({ headless: false });
    const context = await browser.newContext();

    // Wrap the page to get access to the AgentQL's querying API
    const page = await wrap(await context.newPage());

    await page.goto(URL);

    try {
        // Search query
        let response = await page.queryElements(SEARCH_QUERY);
        await response.search_input.type('machine learning', { delay: 75 });
        await response.search_btn.click();

        // Video query
        response = await page.queryElements(VIDEO_QUERY);
        const videoTitle = await response.videos[0].video_title.textContent();
        console.log(`Clicking YouTube Video: ${videoTitle}`);
        await response.videos[0].video_link.click();

        // Video control query
        response = await page.queryElements(VIDEO_CONTROL_QUERY);
        await response.expand_description_btn.click();

        // Description query
        const responseData = await page.queryData(DESCRIPTION_QUERY);
        console.log(`Captured the following description:\n${responseData.description_text}`);

        // Scroll down the page to load more comments
        for (let i = 0; i < 3; i++) {
            await page.keyboard.press('PageDown');
            await page.waitForPageReadyState();
        }

        // Comment query
        const commentsResponse = await page.queryData(COMMENT_QUERY);
        console.log(`Captured ${commentsResponse.comments.length} comments!`);

    } catch (e) {
        console.error(`Found Error: ${e}`);
        throw e;
    }

    // Used only for demo purposes. It allows you to see the effect of the script.
    await page.waitForTimeout(10000);

    await browser.close();
}

main();