Home / API docs / Document Extraction API
Web Content

Document Extraction API

GET /api/extract $0.02 per call USDC on Base · x402

Document extraction: fetch a PDF, DOCX, or CSV by URL and get clean Markdown plus structured JSON — PDF text by page with metadata (honestly flags scanned PDFs that would need OCR), DOCX converted to real Markdown, CSV parsed to typed columns + JSON rows + a Markdown table. For agents that need document contents, not bytes.

Parameters

NameInDescription
urlqueryrequiredPublic http(s) URL of the .pdf, .docx, or .csv document
typequeryForce the parser: pdf, docx, or csv (default: auto-detect from content-type, extension, magic bytes)
max_rowsqueryCSV only: max rows returned as JSON (default 1000, max 5000)

Example request

curl "https://api.webbersites.com/api/extract?url=https%3A%2F%2Fexample.com%2Fquarterly-report.pdf"
# first call returns 402 + payment requirements; an x402 client pays and retries automatically

Example response

{
    "url": "https://example.com/quarterly-report.pdf",
    "type": "pdf",
    "pages": 12,
    "metadata": {
        "title": "Q2 Report",
        "author": "Finance Team"
    },
    "markdown": "## Page 1\n\nExecutive summary…",
    "word_count": 4120
}
MCP tool: get_extract — via npx -y webbersites-x402-mcp (local, key stays on your machine) or the remote endpoint https://api.webbersites.com/mcp.

How payment works

There is no signup and no API key. Call the endpoint; it replies 402 Payment Required with machine-readable payment requirements. Your client signs a USDC transfer authorization (EIP-3009, gasless) and retries with the X-PAYMENT header — @x402/fetch does this automatically. See the overview for a working snippet.