Experimental
Scheduling scraping jobs

AgentQL's Dev Portal enables you to schedule multiple scraping workflows with multiple scraping jobs on different websites with AgentQL queries.

Experimental limitations

Because this feature is still experimental, we're limiting the number of scraping jobs you can schedule to:

  • 2 workflows per user
  • 5 urls per workflow
  • 10 runs per workflow

If you need more scraping jobs, please reach out to us.

Overview

This guide shows you how to use the Dev Portal to create a scraping workflow to scrape Hackernews and Product Hunt discussions to get the latest product launches.

Creating a scraping workflow

  1. On the Dev Portal, navigate to the scheduling page.
  2. Select the Add New Workflow button.
  3. Add a name for your workflow—for example, "Startups News."
  4. Add the URL(s) for the pages that you'd like to extract data from—for example "https://news.ycombinator.com/" to scrape Hackernews and/or "https://www.producthunt.com/discussions" to scrape new product launches on Product Hunt.
  5. Add an AgentQL query, for example this one which will fetch the title, URL, and date posted of each post on the page: https://www.producthunt.com/discussions
{
    posts[] {
        title
        url
        date_posted
    }
}
  1. Select a time to run the query. You may customize the schedule to run at a different time of day, week, or month.
  2. Toggle on Save screenshot to save a screenshot of the webpage at the the time of the job. This can be useful to understand the context of the job and debug data extraction issues (is there a login screen or a popup in the way).
  3. Use the Submit button to create the workflow.

Editing and inspecting scraping workflows

You can inspect a workflow by visiting the scheduling page. Here, you can access each workflow and see the AgentQL query used to scrape the data, the status of the scraping job, the scraped data by selecting, and the screenshot of the webpage at the point of scraping.

note

If you don't see any workflows, you may need to create one first.

Pause a scraping workflow

On the scheduling page, select a workflow you want to pause, and use the Pause button on the top right to pause the workflow.

Edit a scraping workflow

To change a workflows AgentQL query, the list of URLs to scrape, and/or its schedule:

  1. Go to the scheduling page.
  2. Select a workflow you want to edit.
  3. Use the Edit to open the workflow.
  4. Make the necessary changes to the workflow.
  5. Use the Update button to save the changes.

Delete a scraping workflow

On the scheduling page, select a workflow you want to delete, and use the Delete button on the top right to delete the workflow. Confirm the deletion by selecting Delete again.

Run a scraping job manually

On the scheduling page, select a workflow you want to run, and use the Run Now button to run the workflow immediately.

Export scraped data to JSON

On the scheduling page, select a workflow you want to export data from:

  1. Select the checkboxes of the jobs you wish to export. Each URL has a separate job.
  2. Select Export jobs on the top left of the list of jobs.
  3. Select the checkboxes of the fields you wish to export.
  4. Use the Export button to download a JSON file containing the scraped data.