Infrrd × Cloudflare Developer Platform

Infrastructure Profile

What Infrrd actually runs at scale

Infrrd isn't a simple document viewer. It's a high-throughput AI pipeline that ingests raw documents, runs multi-model ML inference, routes to human reviewers when needed, and returns structured data via API — all under SLA deadlines as tight as 15 minutes.

Multi-model LLM Inference

Deep Worker and the IDP pipeline run LLM inference across OCR, classification, extraction, and validation passes on every document. Multiple models — different providers, different cost tiers — need orchestration, caching, and fallback when a provider degrades.

Multi-provider routingSemantic cachingCost observabilityFallback on outage

Agentic Document Workflows

Deep Worker for Mortgage is Infrrd's agentic AI product — it orchestrates multi-step document audit workflows autonomously. These are long-running, stateful processes that must survive failures, retry individual steps, and fan out across thousands of concurrent loan files.

Durable multi-step executionPer-document stateRetry on failureFan-out at scale

Regulated Document Storage

Mortgage loan packages, insurance claims, and engineering drawings contain PHI, PII, and confidential financial data. Raw documents, extracted fields, and audit logs all need durable, encrypted, globally accessible storage — with $0 egress when the LLM pipeline reads them repeatedly.

$0 egress on re-readsS3-compatibleAES-256 at restGDPR-compliant hosting

SOC 2 · GDPR · Regulated Customers

Rocket Mortgage, J&J, PwC, and Unum run regulated workflows on Infrrd. SOC 2 Type II and GDPR compliance are required — and Infrrd's infosec page explicitly lists DDoS protection, application security, and infrastructure hardening as product pillars.

Zero Trust internal accessWAF + API protectionDDoS mitigationAudit trails

Global SLA Delivery

Infrrd promises SLA deadlines as tight as 15 minutes on document processing. Customers in the US, EU, and Asia need sub-100ms API responses and resilient frontend delivery. Infrrd.ai is a globally accessed platform — CDN and smart routing directly impact whether SLAs are met.

330+ city CDN15-min SLA supportGlobal API deliveryTraffic spike absorption

Solution Mapping

Cloudflare Developer Platform → Infrrd

Eight Cloudflare developer platform products mapped to Infrrd's specific infrastructure demands — each with the exact mechanism of value.

LLM Orchestration

AI Gateway

docs ↗ Highest Priority

Infrrd's Deep Worker and IDP pipeline make LLM calls across every document — OCR correction, entity extraction, field validation, classification. Anuj published an arXiv paper in April 2026 specifically on eliminating the MCP/Tools Tax in agentic workflows (95% token reduction via dynamic tool gating). He understands LLM cost optimization at a research level. AI Gateway is the infrastructure answer to the same problem at the API call layer.

💰 Semantic caching

Extraction prompts across similar document types repeat constantly. "Extract borrower name, address, loan amount from this mortgage" runs thousands of times daily. AI Gateway caches semantically similar prompts — Infrrd pays for inference once per pattern, not per document.

🔄 Model fallback

When GPT-4o or Claude degrades — which happens — a 15-minute document SLA breaks if inference stalls. AI Gateway automatically routes to a secondary provider. SLA preserved, no code change, no on-call engineer needed.

📊 Per-request cost observability

At 100B+ pages processed, Infrrd needs to know exactly what LLM inference costs per document type, per customer, per vertical. AI Gateway logs every request with model, token count, latency, and cost — the observability layer Anuj's team doesn't have today without building it themselves.

🛡️ Rate limiting per customer

Enterprise customers submitting large document batches can spike inference costs unexpectedly. AI Gateway enforces per-customer token rate limits at the proxy layer — before the cost hits.

Infrrd fit: Anuj literally published a paper on eliminating token overhead in agentic pipelines. AI Gateway operationalizes that same thinking at the infrastructure layer — semantic caching, intelligent routing, and full cost visibility without changes to Infrrd's application code.

Agentic Execution

Workers + Durable Objects + Workflows

docs ↗ High Priority

Deep Worker for Mortgage is Infrrd's agentic product — it autonomously orchestrates multi-step mortgage audit workflows. Each loan file is an independent stateful process: ingest documents → classify → extract → validate → flag discrepancies → route to HITL if needed → write audit report. These workflows can run for minutes, must survive infrastructure failures, and need to fan out across thousands of concurrent loan packages during peak origination periods.

Cloudflare Workflows — durable, retryable execution for each document processing pipeline step. If the LLM extraction step fails at step 3, Workflows retries from step 3 — not from the beginning. No lost documents, no silent failures.
Durable Objects — one object per active document job. Holds the current state of that loan file's processing: which steps are complete, what was extracted, whether HITL review is pending. Strongly consistent, co-located with Workers, zero external database calls on the hot path.
Workers — the edge compute powering Infrrd's API layer: webhook ingestion from customer document management systems, real-time status callbacks, and the extraction result delivery endpoints. Sub-50ms response times globally without managing servers.

Infrrd fit: "Straight Through Processing" and "No-Touch Processing" are Infrrd's marquee product claims — they only hold when the underlying execution layer is reliable. Workflows + Durable Objects ensure that agentic document pipelines complete or retry cleanly, with full state persistence across every step.

Document & Data Storage

R2 Object Storage + D1 Serverless SQL

docs ↗ High Priority

At 100B+ pages processed and 1,000+ document types, Infrrd's storage problem is real. Raw uploaded documents, multi-page loan packages, extracted JSON fields, HITL annotation data, and audit reports all need durable, encrypted, globally accessible storage. The critical cost driver: every time the LLM pipeline reads a document for a retry, a re-extraction pass, or a quality validation, that's an S3 egress charge. At Infrrd's volume, that compounds.

R2 — $0 egress on all reads. When Infrrd's pipeline processes a 500-page mortgage loan package across five extraction passes, reads are free. For a company with SOC 2 Type II compliance and GDPR-scoped EU data, R2 also supports location hints for data residency — keeping EU customer documents in EU PoPs.
D1 — serverless SQLite for Infrrd's document metadata graph: document status, extraction results, HITL queue state, customer SLA tracking, audit log entries. Query directly from Workers with no connection pool to manage and no RDS cluster to right-size.

Infrrd fit: Infrrd's security page explicitly lists "end-to-end encryption" and "regional hosting options" for GDPR. R2's AES-256 at rest and location hints satisfy both — and eliminate the egress costs that accumulate when an LLM pipeline reads the same document multiple times per workflow.

Security & Compliance

WAF + Zero Trust + API Shield + DDoS

docs ↗ High Priority

Infrrd's security page lists four explicit pillars: Enterprise Data Protection, Application Security, Infrastructure & Network (including DDoS), and Customer Controls. That's Infrrd's own engineering team defining their security requirements — and Cloudflare's security stack addresses every one of them directly.

The surface area: infrrd.ai accepts document uploads from enterprise customers (large file POST endpoints), serves a processing status API queried continuously by customer integrations, and exposes a HITL review interface accessed by human annotators globally. Each is a distinct attack vector.

🛡️ WAF — Managed Ruleset

Blocks SQLi, XSS, and file upload abuse at the edge before reaching Infrrd's origin. Custom rules protect the document upload endpoints — rate-limiting per-customer, blocking oversized payloads, enforcing content-type validation.

🔒 Zero Trust Access

HITL reviewers access Infrrd's annotation interface from home networks globally. Zero Trust replaces VPN with device-posture-aware, identity-verified access — every reviewer session authenticated and logged. No public IPs on internal services.

🔌 API Shield

Infrrd's document ingestion API accepts uploads from enterprise customer integrations. API Shield validates inbound requests against the expected schema — malformed uploads or oversized payloads blocked at edge before reaching processing queues.

⚡ Unmetered DDoS

Document processing spikes during mortgage origination season and insurance claims surges. A targeted DDoS during peak load would directly breach customer SLAs. Cloudflare's unmetered L3/L4 and L7 DDoS protection absorbs attacks automatically — no bandwidth overage charges.

Infrrd fit: Infrrd's own security page defines DDoS protection and application security as infrastructure requirements — not nice-to-haves. Cloudflare's security stack delivers all four of Infrrd's listed pillars from a single platform, with SOC 2-aligned controls and GDPR-compatible data handling that supports Infrrd's compliance posture.

Edge Inference & Search

Workers AI + Vectorize

docs ↗ Consider

For high-volume, lower-complexity inference tasks — document type classification, language detection, entity tagging, and semantic search across processed document archives — Workers AI's serverless GPU inference offers open-source models at a fraction of frontier model cost. Vectorize enables semantic search across Infrrd's processed document corpus.

Document classification — run Llama or BERT-based classifiers at the edge to sort incoming documents (W2, 1040, deed of trust, insurance policy) before routing to the appropriate extraction pipeline. Faster than a frontier model call, cheaper at scale.
Whisper via Workers AI — for any audio or video-embedded documents, run transcription directly in Workers AI without a third-party STT vendor.
Vectorize — semantic search across Infrrd's processed document archive. Enables queries like "find all loans where the appraised value was disputed" across the full document history without rebuilding a search index.

Infrrd fit: Not every document inference call needs GPT-4o. Workers AI handles the classification and pre-processing layer — routing frontier model calls to the steps where they actually matter, cutting per-document inference cost at Infrrd's volume.

Performance & Delivery

CDN + Pages + Argo Smart Routing

docs ↗ Consider

Infrrd serves customers in the US, EU, and Asia with SLA windows as tight as 15 minutes. The infrrd.ai platform — including the HITL review interface used by human annotators globally — needs consistent sub-100ms response times and resilience during processing spikes when large customer batches arrive simultaneously.

CDN — static assets, dashboard UI, and API documentation served from 330+ PoPs. Annotators in India, the US, and Germany all hit a local PoP — not Infrrd's San Jose origin.
Argo Smart Routing — for dynamic API calls (document status polls, extraction result delivery), Argo routes over Cloudflare's private backbone rather than the public internet — 30–40% latency reduction on average for globally distributed customers.
Pages — git-push deploys for Infrrd's React frontend, per-PR preview environments, instant global distribution. Supports Infrrd's engineering team shipping features without CDN invalidation overhead.

Infrrd fit: A 15-minute document SLA is only achievable if the API layer responds fast globally. CDN + Argo ensures that customers querying document status from London or Singapore get the same response time as San Jose — and that large batch submissions don't overload origin.

Infrrd Requirement	Cloudflare Product	Specific Value	Priority
LLM cost, fallback, observability across IDP + Deep Worker	AI Gateway	Semantic caching, model fallback, per-request cost logs, rate limiting	Highest
Agentic document pipeline execution (Deep Worker)	Workers Durable Objects Workflows	Per-document stateful agents, durable retry, sub-50ms API, fan-out at scale	High
Document archive + processed data storage	R2 D1	$0 egress on multi-TB archive, serverless SQL, AES-256, GDPR location hints	High
SOC 2 / GDPR infosec posture, DDoS, API protection	WAF Zero Trust API Shield DDoS	Application security, Zero Trust HITL access, unmetered DDoS, API schema validation	High
Document classification + semantic search	Workers AI Vectorize	Open-source model inference at edge, vector search over document corpus	Consider
Global delivery, 15-min SLA, frontend resilience	CDN Argo Pages	330+ PoP delivery, 30–40% API latency reduction, git-push deploys	Consider

The developer platform
powering Infrrd's next chapter

What Infrrd actually runs at scale

Multi-model LLM Inference

Agentic Document Workflows

Regulated Document Storage

SOC 2 · GDPR · Regulated Customers

Global SLA Delivery

Cloudflare Developer Platform → Infrrd

AI Gateway

Workers + Durable Objects + Workflows

R2 Object Storage + D1 Serverless SQL

WAF + Zero Trust + API Shield + DDoS

Workers AI + Vectorize

CDN + Pages + Argo Smart Routing

Full solution map

Start with AI Gateway — 30 minutes, zero code changes

The developer platform powering Infrrd's next chapter

What Infrrd actually runs at scale

Multi-model LLM Inference

Agentic Document Workflows

Regulated Document Storage

SOC 2 · GDPR · Regulated Customers

Global SLA Delivery

Cloudflare Developer Platform → Infrrd

AI Gateway

Workers + Durable Objects + Workflows

R2 Object Storage + D1 Serverless SQL

WAF + Zero Trust + API Shield + DDoS

Workers AI + Vectorize

CDN + Pages + Argo Smart Routing

Full solution map

Start with AI Gateway — 30 minutes, zero code changes

The developer platform
powering Infrrd's next chapter