Infrrd processes 100B+ pages of enterprise documents — mortgages, insurance claims, engineering drawings, invoices — with 0.1% error rate across 22 languages. Scaling from 150M documents to hundreds of millions more, launching Deep Worker for Agentic AI, and serving regulated customers like Rocket Mortgage, J&J, and PwC means the infrastructure underneath has to match the ambition. This is where Cloudflare's developer platform fits.
Infrastructure Profile
Infrrd isn't a simple document viewer. It's a high-throughput AI pipeline that ingests raw documents, runs multi-model ML inference, routes to human reviewers when needed, and returns structured data via API — all under SLA deadlines as tight as 15 minutes.
Deep Worker and the IDP pipeline run LLM inference across OCR, classification, extraction, and validation passes on every document. Multiple models — different providers, different cost tiers — need orchestration, caching, and fallback when a provider degrades.
Deep Worker for Mortgage is Infrrd's agentic AI product — it orchestrates multi-step document audit workflows autonomously. These are long-running, stateful processes that must survive failures, retry individual steps, and fan out across thousands of concurrent loan files.
Mortgage loan packages, insurance claims, and engineering drawings contain PHI, PII, and confidential financial data. Raw documents, extracted fields, and audit logs all need durable, encrypted, globally accessible storage — with $0 egress when the LLM pipeline reads them repeatedly.
Rocket Mortgage, J&J, PwC, and Unum run regulated workflows on Infrrd. SOC 2 Type II and GDPR compliance are required — and Infrrd's infosec page explicitly lists DDoS protection, application security, and infrastructure hardening as product pillars.
Infrrd promises SLA deadlines as tight as 15 minutes on document processing. Customers in the US, EU, and Asia need sub-100ms API responses and resilient frontend delivery. Infrrd.ai is a globally accessed platform — CDN and smart routing directly impact whether SLAs are met.
Solution Mapping
Eight Cloudflare developer platform products mapped to Infrrd's specific infrastructure demands — each with the exact mechanism of value.
Infrrd's Deep Worker and IDP pipeline make LLM calls across every document — OCR correction, entity extraction, field validation, classification. Anuj published an arXiv paper in April 2026 specifically on eliminating the MCP/Tools Tax in agentic workflows (95% token reduction via dynamic tool gating). He understands LLM cost optimization at a research level. AI Gateway is the infrastructure answer to the same problem at the API call layer.
Extraction prompts across similar document types repeat constantly. "Extract borrower name, address, loan amount from this mortgage" runs thousands of times daily. AI Gateway caches semantically similar prompts — Infrrd pays for inference once per pattern, not per document.
When GPT-4o or Claude degrades — which happens — a 15-minute document SLA breaks if inference stalls. AI Gateway automatically routes to a secondary provider. SLA preserved, no code change, no on-call engineer needed.
At 100B+ pages processed, Infrrd needs to know exactly what LLM inference costs per document type, per customer, per vertical. AI Gateway logs every request with model, token count, latency, and cost — the observability layer Anuj's team doesn't have today without building it themselves.
Enterprise customers submitting large document batches can spike inference costs unexpectedly. AI Gateway enforces per-customer token rate limits at the proxy layer — before the cost hits.
Deep Worker for Mortgage is Infrrd's agentic product — it autonomously orchestrates multi-step mortgage audit workflows. Each loan file is an independent stateful process: ingest documents → classify → extract → validate → flag discrepancies → route to HITL if needed → write audit report. These workflows can run for minutes, must survive infrastructure failures, and need to fan out across thousands of concurrent loan packages during peak origination periods.
At 100B+ pages processed and 1,000+ document types, Infrrd's storage problem is real. Raw uploaded documents, multi-page loan packages, extracted JSON fields, HITL annotation data, and audit reports all need durable, encrypted, globally accessible storage. The critical cost driver: every time the LLM pipeline reads a document for a retry, a re-extraction pass, or a quality validation, that's an S3 egress charge. At Infrrd's volume, that compounds.
Infrrd's security page lists four explicit pillars: Enterprise Data Protection, Application Security, Infrastructure & Network (including DDoS), and Customer Controls. That's Infrrd's own engineering team defining their security requirements — and Cloudflare's security stack addresses every one of them directly.
The surface area: infrrd.ai accepts document uploads from enterprise customers (large file POST endpoints), serves a processing status API queried continuously by customer integrations, and exposes a HITL review interface accessed by human annotators globally. Each is a distinct attack vector.
Blocks SQLi, XSS, and file upload abuse at the edge before reaching Infrrd's origin. Custom rules protect the document upload endpoints — rate-limiting per-customer, blocking oversized payloads, enforcing content-type validation.
HITL reviewers access Infrrd's annotation interface from home networks globally. Zero Trust replaces VPN with device-posture-aware, identity-verified access — every reviewer session authenticated and logged. No public IPs on internal services.
Infrrd's document ingestion API accepts uploads from enterprise customer integrations. API Shield validates inbound requests against the expected schema — malformed uploads or oversized payloads blocked at edge before reaching processing queues.
Document processing spikes during mortgage origination season and insurance claims surges. A targeted DDoS during peak load would directly breach customer SLAs. Cloudflare's unmetered L3/L4 and L7 DDoS protection absorbs attacks automatically — no bandwidth overage charges.
For high-volume, lower-complexity inference tasks — document type classification, language detection, entity tagging, and semantic search across processed document archives — Workers AI's serverless GPU inference offers open-source models at a fraction of frontier model cost. Vectorize enables semantic search across Infrrd's processed document corpus.
Infrrd serves customers in the US, EU, and Asia with SLA windows as tight as 15 minutes. The infrrd.ai platform — including the HITL review interface used by human annotators globally — needs consistent sub-100ms response times and resilience during processing spikes when large customer batches arrive simultaneously.
Quick Reference
| Infrrd Requirement | Cloudflare Product | Specific Value | Priority |
|---|---|---|---|
| LLM cost, fallback, observability across IDP + Deep Worker | AI Gateway | Semantic caching, model fallback, per-request cost logs, rate limiting | Highest |
| Agentic document pipeline execution (Deep Worker) | Workers Durable Objects Workflows | Per-document stateful agents, durable retry, sub-50ms API, fan-out at scale | High |
| Document archive + processed data storage | R2 D1 | $0 egress on multi-TB archive, serverless SQL, AES-256, GDPR location hints | High |
| SOC 2 / GDPR infosec posture, DDoS, API protection | WAF Zero Trust API Shield DDoS | Application security, Zero Trust HITL access, unmetered DDoS, API schema validation | High |
| Document classification + semantic search | Workers AI Vectorize | Open-source model inference at edge, vector search over document corpus | Consider |
| Global delivery, 15-min SLA, frontend resilience | CDN Argo Pages | 330+ PoP delivery, 30–40% API latency reduction, git-push deploys | Consider |
AI Gateway sits in front of your existing LLM calls as a proxy. You point your OpenAI/Anthropic base URL at AI Gateway instead of directly at the provider — that's the entire integration. From that moment you have full observability, semantic caching, and fallback routing. Given your April 2026 paper on eliminating the MCP/Tools Tax, the token savings angle will be immediately quantifiable against your own production traffic.