Extracting data from PDFs and image files with AgentQL
AgentQL supports extracting data from PDFs and image files..
Overview
This guide shows you how to use REST API to from documents like PDFs and images, customize parameters for enhanced scraping capabilities, and retrieve structured data in JSON format with AgentQL queries. Test it out in Playground
Defining the REST API request structure
The following fields outline the high-level structure of a data scraping request:
file
: The file to extract data from.query
: An AgentQL query that defines the data to extract and the format for the retrieved output.params
: (Optional) Additional settings for enhanced data retrieval, such as fast or standard mode. See the API Reference for more details about params.
Constructing the API request
To perform a basic data scraping request, start by defining the url
of the desired webpage and the query
to specify the data you want to retrieve in the request body.
- Example REST API Request
Define parameters for the request body.
1. query: AgentQL query to extract data from the file
2. file: Path of file to extract data from
3. params: (Optional) Additional settings for enhanced data retrieval, such as fast mode. See the [API Reference](/rest-api/api-reference#request-body) for more details about params.
- Setting Request Headers
Before making the API request, include the necessary headers for authentication and content type. These headers authorize the request and specify the data format.
-
X-API-Key
: this header should have your AgentQL API key for authentication. -
Content-Type
: set it toapplication/json
to indicate that the request body is in JSON format, allowing the server to interpret the data correctly.
- Making the API Request
Using your preferred HTTP client (like curl, Postman, or an HTTP library in Python or your preferred language), you can make a POST request to the AgentQL REST API endpoint.
curl -X POST https://api.agentql.com/v1/query-document \
-H "Content-Type: multipart/form-data" \
-H "X-API-Key: \
-F "query={products[] { product_name product_price(integer) } }" \
-F "file=@/path/to/file.pdf" \
-d " { "params": { "mode": "fast" } }
Make sure to replace $AGENTQL_API_KEY
with your actual API key.
- Reviewing the API Response
If the request is successful, the API returns a JSON response with the extracted data.
Example Response
{
"data": {
"products": [
{
"product_name": "Qwilfish",
"product_price": 77
},
{
"product_name": "Huntail",
"product_price": 52
},
...
]
},
"metadata": {
"request_id": "ecab9d2c-0212-4b70-a5bc-0c821fb30ae3"
}
}
You can read more about the response structure and metadata fields in the API Reference.
Test this feature in Playground!
- Go to AgentQL's Playground.
- Click the "Documented (Experimental)" toggle.
- Either click "Choose file" to upload your PDF, JPG, or PNG file or drag and drop your file into the target preview area.
- Add an AgentQL query to the query box (or use the "Suggest a Query" button to have AgentQL craft a query for you).
- Click the "Fetch Data" button.
Check the results box for your extracted data, and please let the AgentQL team know your feedback. If you would like to access the feature via the SDK, please reach out to join Tiny Fish's Beta Access Program.