Understanding CuraSense
A comprehensive exploration of the architectural decisions, AI orchestration patterns, and engineering challenges behind building an enterprise-grade medical diagnosis system.
Contents
Problem Statement: The Healthcare AI Gap
Modern healthcare faces a critical bottleneck: the gap between the exponential growth of medical knowledge and the limited capacity of healthcare professionals to process, analyze, and apply this information in real-time clinical settings. Every day, physicians must interpret complex diagnostic reports, cross-reference medication interactions, and make split-second decisions that directly impact patient outcomes.
Traditional healthcare software solutions operate in isolation—prescription analyzers don't communicate with imaging systems, drug databases remain disconnected from patient history analysis, and medical insights are scattered across multiple platforms. This fragmentation leads to:
- Delayed diagnoses as practitioners manually correlate data from multiple sources
- Medication errors due to overlooked drug interactions in complex prescriptions
- Missed insights from medical imaging that require specialized radiological expertise
- Information overload preventing effective synthesis of patient data
The Core Problem
CuraSense was conceived as a direct response to this challenge. Rather than building another isolated medical tool, the goal was to architect a comprehensive AI platform that mirrors how an experienced medical team would collaborate: multiple specialists working in concert, each contributing their expertise to arrive at a holistic patient assessment.
Solution Architecture: A Layered Approach
CuraSense employs a meticulously designed four-layer architecture that separates concerns while enabling seamless data flow between components. This architectural decision wasn't arbitrary—it emerged from the fundamental requirements of medical AI systems: reliability, interpretability, and the ability to evolve individual components without disrupting the entire system.
The Presentation Layer
Built with Next.js 14 and React 19, the frontend represents more than just a user interface—it's a carefully crafted experience designed for clinical workflows. The decision to use the App Router architecture enables server-side rendering for initial page loads, critical for environments where network latency varies. React Server Components handle data fetching on the server, reducing the JavaScript payload sent to clients and improving performance on hospital workstations that may not have cutting-edge hardware.
Framer Motion powers the animation system, but beyond aesthetics, animations serve a functional purpose: they provide visual feedback during AI processing, reducing perceived wait times and indicating system state. When a physician uploads a chest X-ray, subtle loading animations communicate that the system is actively working, maintaining trust in the AI's responsiveness.
The API Layer
FastAPI serves as the backbone of the API layer, chosen specifically for its async-first design and automatic OpenAPI documentation generation. In healthcare contexts, API documentation isn't a luxury—it's a compliance requirement. The ability to auto-generate accurate API specs reduces documentation drift and simplifies integration audits.
The API layer implements Server-Sent Events (SSE) for real-time streaming, a critical architectural choice for AI diagnosis systems. Unlike WebSocket connections that maintain bidirectional communication channels, SSE provides a unidirectional stream perfectly suited for the use case: clients send requests and then receive a continuous stream of updates as the AI agents process information. This reduces connection overhead and simplifies the client-side implementation while providing the real-time feedback clinicians need.
Why Not WebSockets?
The Intelligence Layer
This is where CuraSense truly differentiates itself. Rather than relying on a single monolithic AI model, the intelligence layer implements a multi-agent architecture using CrewAI for agent orchestration and LangGraph for workflow management. Each agent specializes in a specific domain: document analysis, medical interpretation, drug interaction checking, and report synthesis.
The Gemini Pro model handles textual analysis, while Gemini Vision processes medical imaging. This dual-model approach acknowledges that different AI tasks require different capabilities—language models excel at reasoning over text, while vision models understand spatial relationships in images. By separating these concerns, each model operates in its optimal domain.
The Data Layer
ChromaDB serves as the vector database, enabling semantic search over medical knowledge bases. The choice of a vector database over traditional relational storage reflects the nature of medical queries: clinicians don't search for exact string matches—they search for conceptually similar information. A query about "chest pain radiating to left arm" should retrieve information about cardiac symptoms even if those exact words aren't present in the knowledge base.
Session-based data isolation with a 15-minute TTL ensures HIPAA compliance without sacrificing functionality. Patient data exists only long enough for analysis, then automatically purges—reducing liability while maintaining the real-time processing clinicians need.
Multi-Agent AI System: Collaborative Intelligence
The multi-agent architecture represents CuraSense's most innovative technical contribution. Traditional AI applications employ single models that handle all tasks—a pattern that struggles with the multifaceted nature of medical diagnosis. CuraSense instead implements a team of specialized AI agents, each with distinct capabilities and responsibilities.
Agent Specialization
The Document Analyzer Agent serves as the entry point for all uploaded medical documents. Using advanced PDF parsing libraries combined with Named Entity Recognition (NER) models from Hugging Face, this agent extracts structured data from unstructured medical documents. It identifies medication names, dosages, frequency schedules, and diagnostic codes—transforming a PDF prescription into a machine-readable format that subsequent agents can process.
What makes this agent particularly sophisticated is its ability to handle the variability inherent in medical documents. Prescriptions from different healthcare systems use different formats, abbreviations, and conventions. The Document Analyzer adapts to these variations, normalizing output regardless of input format.
The Medical Expert Agent provides clinical interpretation of extracted data. This agent has been prompt-engineered with extensive medical knowledge, enabling it to understand the clinical significance of lab values, recognize concerning patterns in diagnostic reports, and contextualize findings within broader medical frameworks.
When analyzing a blood panel, for instance, the Medical Expert doesn't just report that hemoglobin is 9.2 g/dL—it contextualizes this value, notes that it indicates mild anemia, suggests potential causes, and recommends follow-up investigations. This mirrors how an experienced physician would interpret the same data.
The Drug Interaction Agent performs critical safety checks. Using the RAG (Retrieval-Augmented Generation) pipeline, this agent queries an extensive drug interaction database to identify potential conflicts between medications. It considers not just direct interactions but also cumulative effects, contraindications based on patient conditions, and timing considerations for medication administration.
The Report Generator Agent synthesizes outputs from all previous agents into coherent, clinician-friendly reports. This agent employs sophisticated natural language generation to produce reports that balance completeness with readability—including all clinically relevant details while organizing information in a logical, actionable format.
Agent Communication Protocol
Orchestration with CrewAI
CrewAI manages the coordination between agents, determining execution order, handling dependencies, and managing failures. The framework implements a supervisor pattern where a coordinator agent oversees the diagnosis process, dynamically routing tasks based on document type and complexity.
For a simple prescription analysis, the coordinator might engage only the Document Analyzer and Drug Interaction agents. For a comprehensive health assessment including imaging, all agents participate in a carefully choreographed sequence. This adaptive orchestration optimizes processing time while ensuring all necessary analyses are performed.
RAG Pipeline: Augmented Intelligence
Retrieval-Augmented Generation (RAG) addresses a fundamental limitation of large language models: their knowledge is frozen at training time. Medical knowledge evolves rapidly—new drug interactions are discovered, treatment protocols are updated, and clinical guidelines are revised. A purely pre-trained model quickly becomes outdated.
CuraSense's RAG pipeline bridges this gap by combining the reasoning capabilities of LLMs with real-time retrieval from up-to-date medical knowledge bases. When the Drug Interaction Agent needs to check for medication conflicts, it doesn't rely solely on the LLM's training data—it actively queries a vector database containing current pharmaceutical data.
Vector Embeddings and Semantic Search
At the core of the RAG system lies ChromaDB, a vector database optimized for semantic search. Medical documents are processed through embedding models that convert text into high-dimensional vectors capturing semantic meaning. When a query arrives—such as "medications contraindicated with warfarin"—the system converts this query to a vector and finds the most semantically similar documents in the database.
This semantic approach vastly outperforms keyword matching. A search for "blood thinners" would correctly retrieve documents about anticoagulants even if that exact phrase never appears. The embedding model understands conceptual relationships that keyword systems miss.
Context Window Optimization
One engineering challenge with RAG is managing context window limits. LLMs can only process a finite amount of text—typically 8,000 to 128,000 tokens depending on the model. When retrieval returns numerous relevant documents, naive approaches simply truncate, potentially losing critical information.
CuraSense implements intelligent context compression. Retrieved documents are ranked by relevance, summarized where appropriate, and strategically combined to maximize information density within context limits. The system preserves key facts while eliminating redundancy, ensuring the LLM receives the most informative possible context.
Knowledge Base Updates
Vision Analysis: AI-Powered Medical Imaging
Medical imaging interpretation represents one of healthcare's most time-intensive and expertise-dependent tasks. Radiologists spend years developing the pattern recognition skills needed to identify abnormalities in X-rays, CT scans, and MRIs. CuraSense's vision analysis module provides AI-assisted interpretation that augments—rather than replaces—radiological expertise.
Gemini Vision Integration
The vision pipeline leverages Google's Gemini Vision model, a multimodal AI capable of understanding both images and text. When a physician uploads a chest X-ray, the system processes the image through several stages:
- Image preprocessing: Normalization, enhancement, and format standardization ensure consistent analysis regardless of source equipment
- Region identification: The model identifies anatomical structures—lungs, heart, ribs, diaphragm—establishing a spatial framework for analysis
- Abnormality detection: Pattern recognition identifies potential issues: masses, infiltrates, cardiomegaly, pneumothorax, and other findings
- Clinical correlation: Detected findings are correlated with any provided clinical history to assess significance
Confidence Scoring and Uncertainty
Unlike deterministic software, AI vision systems must communicate uncertainty. CuraSense implements confidence scoring for all imaging findings. When the system identifies a potential nodule, it provides both the finding and a confidence level. A high-confidence finding might recommend immediate follow-up, while a low-confidence detection suggests additional imaging or specialist review.
This probabilistic approach reflects medical reality. Even expert radiologists disagree on subtle findings—acknowledging uncertainty is more valuable than false precision. The system is calibrated to err on the side of caution: it's better to flag a benign finding for review than to miss a potential malignancy.
Multimodal Analysis
Real-Time Streaming: Server-Sent Events Architecture
In clinical settings, perceived responsiveness matters as much as actual processing time. A system that provides no feedback during a 30-second analysis feels slower than one that continuously updates users on progress—even if total processing time is identical. CuraSense implements sophisticated real-time streaming to maintain user engagement throughout the analysis pipeline.
The SSE Implementation
Server-Sent Events (SSE) provide a unidirectional channel from server to client over a standard HTTP connection. When a user submits a document for analysis, the server immediately establishes an SSE connection and begins streaming updates:
- Stage notifications: "Analyzing document structure...", "Extracting medications...", "Checking interactions..."
- Partial results: As each agent completes, its findings stream to the client immediately
- Progress indicators: Percentage-based progress updates for long-running analyses
- Final synthesis: The complete report streams incrementally, enabling users to begin reading before generation completes
Frontend Integration
The React frontend uses the EventSource API to receive SSE streams, parsing each event and updating component state. Framer Motion animates new content into view, creating a fluid experience as analysis results progressively appear. This streaming approach transforms a potentially tedious wait into an engaging, informative experience.
Error handling is particularly important in streaming scenarios. The system implements automatic reconnection with exponential backoff, ensuring that temporary network issues don't lose analysis progress. Partial results are cached client-side, so a reconnection resumes from the last received event rather than restarting the entire analysis.
Security & Privacy: HIPAA-Conscious Design
Healthcare applications operate under stringent regulatory requirements. While CuraSense is a demonstration project, its architecture embodies security principles that would satisfy HIPAA requirements in production deployment. Understanding these design decisions illustrates how security consciousness shapes architectural choices.
Session-Based Data Isolation
Patient data never persists beyond the analysis session. Each upload receives a unique session identifier, and all extracted data, intermediate results, and final reports are associated with this session. A background task continuously monitors session age, automatically purging any session older than 15 minutes.
This ephemeral approach dramatically reduces security surface area. There's no database of patient records to breach, no historical data to protect with encryption at rest, no backup tapes to secure. Data exists only in memory during active processing, then vanishes.
Authentication with Clerk
User authentication leverages Clerk, a modern authentication platform that handles the complexities of secure identity management. Clerk provides multi-factor authentication, secure session management, and social login integration—all implemented following security best practices that would take months to build from scratch.
By delegating authentication to a specialized provider, CuraSense avoids common security pitfalls: password storage vulnerabilities, session hijacking, and authentication bypass bugs. Clerk's team focuses exclusively on authentication security, ensuring the implementation stays current with emerging threats.
Defense in Depth
Challenges & Solutions: Lessons Learned
Building CuraSense surfaced numerous technical challenges, each requiring creative solutions. These engineering lessons illustrate the gap between conceptual architecture and working systems.
Challenge: LLM Hallucination in Medical Context
Large language models occasionally generate plausible-sounding but incorrect information—a phenomenon known as hallucination. In medical contexts, hallucinations are dangerous. A fabricated drug interaction could lead to inappropriate treatment decisions.
Solution: CuraSense implements multiple hallucination mitigation strategies. The RAG pipeline grounds responses in retrieved documents, reducing the model's need to "invent" information. Agent outputs are cross-validated—if the Drug Interaction Agent flags an interaction, the Medical Expert Agent independently verifies its clinical significance. Confidence thresholds prevent low-confidence assertions from appearing in final reports without appropriate caveats.
Challenge: Latency in Multi-Agent Systems
Sequential agent execution creates additive latency. If each of four agents takes 5 seconds, users wait 20 seconds for results—unacceptable in clinical workflows.
Solution: LangGraph enables parallel execution of independent agents. The Document Analyzer must complete before downstream agents begin, but the Medical Expert and Drug Interaction agents can operate concurrently since neither depends on the other's output. This parallel execution pattern reduced average analysis time by approximately 40% compared to strictly sequential processing.
Challenge: Variable Document Quality
Medical documents arrive in wildly varying quality—blurry scans, handwritten notes, faded thermal paper. Image-based PDFs require OCR, which introduces errors. Poor-quality inputs degrade analysis accuracy.
Solution: The Document Analyzer implements a multi-stage extraction pipeline. Initial OCR output is post-processed by the LLM, which corrects obvious errors using context. "Metfomrin 500mg" becomes "Metformin 500mg" because the model knows medication names. When confidence in extraction is low, the system flags affected sections, requesting user verification rather than silently propagating errors.
Continuous Improvement
Conclusion: The Future of AI in Healthcare
CuraSense demonstrates that sophisticated AI systems aren't monolithic black boxes—they're carefully orchestrated ensembles of specialized components. The multi-agent architecture, RAG-augmented knowledge retrieval, and real-time streaming combine to create a system greater than the sum of its parts.
As AI capabilities continue advancing, the principles embodied in CuraSense—specialization, collaboration, grounding in retrieved knowledge, and transparent uncertainty—will only become more relevant. The future of healthcare AI lies not in replacing human expertise but in augmenting it with systems that handle information processing at scales impossible for human cognition alone.