Document Extraction API

Document Extraction API for Invoices, Contracts, and PDFs

Integrate AI document extraction into your product or workflow with a simple REST API. Extract structured JSON from invoices, contracts, purchase orders, receipts, and scanned PDFs — no ML infrastructure required.

What Paperloom extracts

Built specifically for document extraction API

REST API

POST a document PDF or image and receive structured JSON back. Simple authentication, predictable response format, and webhook support for async workflows.

Multi-Document Type Support

One API endpoint handles invoices, contracts, purchase orders, receipts, bank statements, and general PDFs — with auto-detection of document type.

Structured JSON Output

Every response returns field names, extracted values, confidence scores, and source bounding boxes in a consistent JSON schema.

Confidence Scores

Every extracted field includes a confidence score so your application can decide whether to auto-approve or route to human review.

Webhook Callbacks

Register a webhook endpoint to receive extraction results asynchronously for batch processing and document pipeline workflows.

Multilingual Support

The API handles documents in Arabic, French, Spanish, German, Chinese, Japanese, and other languages without any configuration parameters.

How it works

From raw document to structured data in seconds

  1. 1

    Authenticate with your API key from the Paperloom dashboard.

  2. 2

    POST your document (PDF, JPEG, PNG, or TIFF) to the /api/extract endpoint.

  3. 3

    Specify the document type or let Paperloom auto-detect invoice, contract, PO, receipt, or general PDF.

  4. 4

    Receive a structured JSON response with field values, confidence scores, and source locations.

  5. 5

    Route low-confidence fields to your own review UI or use Paperloom's built-in review queue.

Use cases

Who uses Paperloom for document extraction API

Accounts Payable Automation Products

Finance software companies embed Paperloom's API to add invoice extraction capabilities to their products without building ML pipelines.

ERP and Accounting Integrations

Developers use the document extraction API to build document-to-ERP workflows that auto-populate fields from uploaded invoices and POs.

Document Management Systems

DMS platforms integrate the API to automatically extract metadata from uploaded documents and index them for search and compliance.

Custom Business Workflows

Operations teams build custom document processing pipelines that extract, validate, and route data from uploaded documents without manual intervention.

Start extracting documents free today

No setup. No credit card. 20 free credits on sign-up.

Get Started Free

Frequently asked questions about document extraction API

What is a document extraction API?

A document extraction API accepts a PDF or document image as input and returns structured data — field names, values, and confidence scores — as JSON. Developers use it to add document extraction capabilities to their products without building or maintaining ML models.

What document types does the Paperloom API support?

The API supports invoices, contracts, purchase orders, receipts, bank statements, bills of lading, and general PDFs. Pass the document type as a parameter or let the API auto-detect it.

What format does the API return?

The API returns structured JSON with field names, extracted string or numeric values, confidence scores (0–1), and optional source coordinates (bounding boxes) on the original document.

Is there a rate limit on the API?

Rate limits depend on your plan. Free accounts can extract up to 20 documents. Paid plans include higher document limits with options for bulk processing and dedicated infrastructure for high-volume use.

Does the API support async processing for large files?

Yes. For large PDFs or batch processing, the API supports async mode where you submit a document, receive a job ID, and either poll for results or register a webhook for callbacks.