Document Extraction API
GET /api/extract $0.02 per call
USDC on Base · x402
Document extraction: fetch a PDF, DOCX, or CSV by URL and get clean Markdown plus structured JSON — PDF text by page with metadata (honestly flags scanned PDFs that would need OCR), DOCX converted to real Markdown, CSV parsed to typed columns + JSON rows + a Markdown table. For agents that need document contents, not bytes.
Parameters
| Name | In | Description | |
|---|---|---|---|
url | query | required | Public http(s) URL of the .pdf, .docx, or .csv document |
type | query | Force the parser: pdf, docx, or csv (default: auto-detect from content-type, extension, magic bytes) | |
max_rows | query | CSV only: max rows returned as JSON (default 1000, max 5000) |
Example request
curl "https://api.webbersites.com/api/extract?url=https%3A%2F%2Fexample.com%2Fquarterly-report.pdf"
# first call returns 402 + payment requirements; an x402 client pays and retries automaticallyExample response
{
"url": "https://example.com/quarterly-report.pdf",
"type": "pdf",
"pages": 12,
"metadata": {
"title": "Q2 Report",
"author": "Finance Team"
},
"markdown": "## Page 1\n\nExecutive summary…",
"word_count": 4120
}MCP tool: get_extract — via npx -y webbersites-x402-mcp (local, key stays on your machine) or the remote endpoint https://api.webbersites.com/mcp.
How payment works
There is no signup and no API key. Call the endpoint; it replies 402 Payment Required with machine-readable payment requirements. Your client signs a USDC transfer authorization (EIP-3009, gasless) and retries with the X-PAYMENT header — @x402/fetch does this automatically. See the overview for a working snippet.