Architecture Blueprint

AI-Powered SNF Referral
Management Platform

End-to-end architecture β€” from faxed PDFs to admit/deny recommendations

✨ Updated: Now with Graphiti Temporal Knowledge Graph
πŸ“  Faxed Referrals
πŸ₯ Hospital Portals (Epic, AllScripts, NaviHealth)
πŸ“„ Direct PDF Uploads
πŸ“§ E-Fax (Twilio)
πŸ”— HL7v2 / FHIR Feeds
1
πŸ“₯ Document Ingestion & OCR
300+ page referral packets β†’ clean structured text Β· <3 min per packet
πŸ“ Marker
PDF β†’ Markdown with 96% accuracy. Tables, multi-column, headers preserved. LLM-enhanced mode.
PRIMARY OCR Β· GPL-3.0
πŸ” Surya
Transformer-based OCR. 97% similarity vs Google Cloud Vision. 90+ languages, 0.62s/page.
SCANNED DOCS Β· GPL-3.0
✍️ TrOCR
Microsoft's handwriting recognition. 94.6% accuracy on medical handwritten forms.
HANDWRITING Β· MIT
πŸ‘οΈ Qwen2.5-VL
Vision-language model. Reads complex pages visually β€” tables, charts, mixed content. 7B params.
COMPLEX DOCS Β· Apache 2.0
πŸ“ Docling
IBM layout analysis. Document structure understanding, reading order, section detection.
LAYOUT Β· MIT
πŸ”§ OpenCV
Preprocessing: deskew, denoise, contrast enhance, upscale to 300 DPI. +5-10% accuracy on faxes.
PREPROCESSING Β· Apache 2.0
2
🧬 Clinical NLP & Entity Extraction
Raw text β†’ structured medical entities (diagnoses, meds, insurance, demographics)
πŸ₯ scispaCy
Allen AI clinical NLP. Medical NER: diagnoses, medications, procedures, anatomy. Battle-tested.
CORE NLP Β· Apache 2.0
🧠 BioBERT
Fine-tuned on n2c2/i2b2 datasets. ICD-10 code extraction, clinical relation detection.
NER MODEL Β· Apache 2.0
πŸ’Š MedXN
Mayo Clinic medication extractor. Drug names β†’ RxNorm normalized codes. Interaction checking.
MEDICATIONS Β· Open Source
πŸ”’ Presidio
Microsoft PHI de-identification. 94% recall on patient names. HIPAA compliance layer.
DE-ID Β· MIT
πŸ—οΈ FHIR Resources
Map entities to FHIR Patient, Condition, MedicationStatement, Coverage. Interoperable output.
STANDARDS Β· Open Source
3
🧠 Intelligence Engine β€” Fine-Tuned LLM
Clinical reasoning, facility matching, financial analysis Β· Custom-trained on healthcare data
⚑ Qwen 2.5 (32B)
Base model fine-tuned with QLoRA on MIMIC-III/IV clinical data. Top clinical reasoning scores.
BASE MODEL Β· Apache 2.0
πŸ”§ LLaMA-Factory
Fine-tuning framework. QLoRA r=64, 72% MedQA accuracy. Trains 32B model on single A100 in ~12hrs.
FINE-TUNING Β· Apache 2.0
πŸ“‹ SGLang / Outlines
Guaranteed valid JSON extraction. FHIR-compatible schemas with per-field confidence scores.
STRUCTURED OUTPUT Β· Apache 2.0
πŸ“Š RAGAS + DeepEval
Evaluate faithfulness, relevancy, hallucination rate. Custom clinical accuracy metrics.
EVALUATION Β· Apache 2.0
3Β½
πŸ•ΈοΈ Hybrid Knowledge Layer NEW
Temporal knowledge graph + vector search β€” relational reasoning meets semantic retrieval
πŸ‘€
Patient
has_dx
🩺
Diagnosis
requires
πŸ•ΈοΈ
Graphiti
Temporal KG
covers
πŸ’³
Payer
accepts
🏠
Facility
πŸ•ΈοΈ

Graphiti (Zep)

Temporal Knowledge Graph
Entity relationships that evolve over time. Bitemporal tracking (event time + ingestion time). Multi-hop reasoning: Patient β†’ Diagnosis β†’ Equipment β†’ Facility capability.

Queries: "What are this patient's current active medications?" Β· "Has this payer denied this diagnosis combo before?" Β· "Which facilities accepted similar acuity last month?"

Neo4j / FalkorDB Β· Apache 2.0

+
πŸ—„οΈ

Weaviate

Vector Semantic Search
Document-level retrieval with embeddings. Facility criteria docs, clinical guidelines, drug databases, ICD-10 codebook. Multi-tenant per facility.

Queries: "What are Facility X's admission criteria for wound care patients?" Β· "What does Medicare say about 3-day stay waivers?" Β· "Drug interactions for this medication list?"

On-prem K8s, HIPAA-ready Β· BSD-3

⏱️ Bitemporal Tracking
Every fact has valid_from/valid_to timestamps. Know what was true WHEN. Medications discontinued, diagnoses resolved, payer rules changed β€” all tracked.
TEMPORAL Β· Graphiti
πŸ”— Multi-Hop Reasoning
Patient β†’ Diagnosis β†’ requires Equipment β†’ Facility has/doesn't have. Traverse entity chains for complex admission logic.
GRAPH TRAVERSAL Β· Neo4j
πŸ“š LlamaIndex + LangChain
Hybrid RAG orchestration. Routes queries to graph (relational) or vector (semantic) based on question type.
RAG FRAMEWORK Β· MIT/Apache
πŸ“ˆ Institutional Memory
Graph accumulates cross-referral patterns over time. Readmission predictors, payer behavior, facility performance β€” emerges automatically.
PATTERN DISCOVERY Β· Graphiti
4
πŸ€– Multi-Agent Decision System (LangGraph)
Specialized agents query both graph & vector knowledge to produce recommendations
πŸ”€
Triage Agent
Classify urgency & route
Vector: guidelines
β†’
🩺
Clinical Agent
Risk assessment & care needs
Graph: patient history
β†’
πŸ’°
Financial Agent
PDPM, insurance, med costs
Graph: payer patterns
β†’
βœ…
Criteria Agent
Facility match via RAG
Both: criteria + capabilities
β†’
πŸ“
Explanation Agent
Reasoning + page citations
Graph: provenance trail
5
πŸ“€ Decision Output & Integrations
Structured recommendations pushed to EHRs, dashboards, and clinical review queues
βš–οΈ Admit / Consider / Deny
Transparent recommendation with confidence score, reasoning chain, and source page citations from the referral packet.
PRIMARY OUTPUT
πŸ“Š Clinical Summary
Single-page patient overview: diagnoses, medications, risks, care needs, financial projections.
SUMMARY
πŸ”Œ EHR Push
PointClickCare + MatrixCare integration. FHIR R4 resources. Bidirectional sync.
INTEGRATION
πŸ‘¨β€βš•οΈ Human Review Queue
Low-confidence items routed to clinicians. Override feedback loops back into model + graph.
HUMAN-IN-THE-LOOP
βš™οΈ
🏭 Production Infrastructure
HIPAA-compliant, scalable, monitored β€” 240+ packets/day on a single GPU
πŸš€ vLLM
Model serving. 24x throughput vs alternatives. AWQ quantization. OpenAI-compatible API.
SERVING Β· Apache 2.0
☁️ HealthStack
Open-source IaC for AWS. HIPAA Terraform modules: VPC, encryption, audit logging, BAA-ready.
INFRASTRUCTURE Β· OSS
πŸ“ˆ MLflow + W&B
Experiment tracking, model registry, A/B testing. W&B has HIPAA BAA for enterprise.
MLOPS Β· Apache/Comm
πŸ–₯️ NVIDIA L4 GPU
24GB VRAM, $1.50/hr on AWS. Single GPU handles full pipeline. Scale to multi-GPU as needed.
COMPUTE Β· $800/mo
🌐 FastAPI
Async API layer. OAuth2/OIDC auth. Webhook callbacks. Rate limiting via Celery + Redis.
API Β· MIT
πŸ“Š Grafana + Prometheus
Observability: latency, accuracy drift, error rates, GPU utilization, model performance.
MONITORING Β· OSS
18
Weeks to Build
96%
OCR Accuracy
<3m
Per Packet
240+
Packets/Day
$2.4K
Monthly Cost
100%
Open Source Core