ExperimentalScheduling scraping jobs
AgentQL's Dev Portal enables you to schedule multiple scraping workflows with multiple scraping jobs on different websites with AgentQL queries.
Overview
This guide shows you how to use the Dev Portal to create a scraping workflow to scrape Hackernews and Product Hunt discussions to get the latest product launches.
Creating a scraping workflow
- On the Dev Portal, navigate to the scheduling page.
- Select the Add New Workflow button.
- Add a name for your workflow—for example, "Startups News."
- Add the URL(s) for the pages that you'd like to extract data from—for example "https://news.ycombinator.com/" to scrape Hackernews and/or "https://www.producthunt.com/discussions" to scrape new product launches on Product Hunt.
- Add an AgentQL query, for example this one which will fetch the title, URL, and date posted of each post on the page: https://www.producthunt.com/discussions
- Select a time to run the query. You may customize the schedule to run at a different time of day, week, or month.
- Toggle on Save screenshot to save a screenshot of the webpage at the the time of the job. This can be useful to understand the context of the job and debug data extraction issues (is there a login screen or a popup in the way).
- Use the Submit button to create the workflow.
Editing and inspecting scraping workflows
You can inspect a workflow by visiting the scheduling page. Here, you can access each workflow and see the AgentQL query used to scrape the data, the status of the scraping job, the scraped data by selecting, and the screenshot of the webpage at the point of scraping.
Pause a scraping workflow
On the scheduling page, select a workflow you want to pause, and use the Pause button on the top right to pause the workflow.
Edit a scraping workflow
To change a workflows AgentQL query, the list of URLs to scrape, and/or its schedule:
- Go to the scheduling page.
- Select a workflow you want to edit.
- Use the Edit to open the workflow.
- Make the necessary changes to the workflow.
- Use the Update button to save the changes.
Delete a scraping workflow
On the scheduling page, select a workflow you want to delete, and use the Delete button on the top right to delete the workflow. Confirm the deletion by selecting Delete again.
Run a scraping job manually
On the scheduling page, select a workflow you want to run, and use the Run Now button to run the workflow immediately.
Export scraped data to JSON
On the scheduling page, select a workflow you want to export data from:
- Select the checkboxes of the jobs you wish to export. Each URL has a separate job.
- Select Export jobs on the top left of the list of jobs.
- Select the checkboxes of the fields you wish to export.
- Use the Export button to download a JSON file containing the scraped data.