System Architecture¶
This document provides a comprehensive overview of the Trump Speeches NLP Chatbot API architecture, including system components, data flows, and deployment strategies.
Table of Contents¶
- High-Level Architecture
- Component Architecture
- RAG Pipeline
- Data Flow
- API Architecture
- Deployment Architecture
- Technology Stack
- Scalability Considerations
Project Overview (Thumbnail)¶
%%{init: {'theme':'dark', 'themeVariables': { 'primaryColor':'#667eea', 'primaryTextColor':'#fff', 'primaryBorderColor':'#764ba2', 'lineColor':'#667eea', 'secondaryColor':'#764ba2', 'tertiaryColor':'#1a1a2e'}}}%%
flowchart TB
subgraph Client["๐ Client Layer"]
UI["Dark Mode UI<br/>Interactive Dashboard"]
end
subgraph API["โก FastAPI Backend"]
Routes["RESTful Endpoints<br/>Async Request Handling"]
end
subgraph AI["๐ค AI Services"]
RAG["RAG System<br/>Hybrid Search + LLM"]
Sentiment["Sentiment Analysis<br/>Multi-Model + AI"]
Topics["Topic Extraction<br/>Semantic Clustering"]
end
subgraph Data["๐พ Data & Models"]
Vector["ChromaDB<br/>Vector Storage"]
Models["Transformers<br/>FinBERT โข RoBERTa โข MPNet"]
LLM["LLM Providers<br/>Gemini โข OpenAI โข Claude"]
end
UI <-->|HTTP/JSON| Routes
Routes --> RAG
Routes --> Sentiment
Routes --> Topics
RAG <--> Vector
RAG --> LLM
Sentiment --> Models
Sentiment --> LLM
Topics --> Models
Topics --> LLM
style Client fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
style API fill:#764ba2,stroke:#667eea,stroke-width:3px,color:#fff
style AI fill:#1a1a2e,stroke:#667eea,stroke-width:2px,color:#e0e0e0
style Data fill:#2a2a3a,stroke:#667eea,stroke-width:2px,color:#e0e0e0
High-Level Architecture¶
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#f0f3ff', 'primaryTextColor':'#333', 'primaryBorderColor':'#667eea', 'lineColor':'#667eea', 'secondaryColor':'#e9ecef', 'tertiaryColor':'#fff'}}}%%
graph TB
Client["๐ค Client/Browser<br/><small>Chrome, Firefox, Safari</small>"]
Frontend["๐จ Static Frontend<br/><small>HTML/CSS/JS โข Dark Mode UI</small>"]
API["โก FastAPI Application<br/><small>REST API โข Async โข Uvicorn</small>"]
subgraph NLP["๐ง AI/NLP Services"]
direction TB
Sentiment["๐ Sentiment Analysis<br/><small>FinBERT + RoBERTa + LLM</small>"]
Topics["๐ท๏ธ Topic Analysis<br/><small>Semantic Clustering + LLM</small>"]
RAG["๐ RAG System<br/><small>Hybrid Search + ChromaDB</small>"]
LLMService["๐ค LLM Providers<br/><small>Gemini โข OpenAI โข Claude</small>"]
end
subgraph Data["๐พ Data Layer"]
direction TB
Speeches["๐ Demo Dataset<br/><small>35+ Political Speeches</small>"]
VectorDB[("๐๏ธ ChromaDB<br/><small>Vector Store โข Persistent</small>")]
Models["๐ฏ ML Models<br/><small>PyTorch โข Transformers</small>"]
end
Client <-->|"HTTP Requests"| Frontend
Frontend <-->|"REST API Calls<br/>/rag/ask, /analyze/*"| API
API -->|"Analyze Sentiment"| Sentiment
API -->|"Extract Topics"| Topics
API -->|"Answer Questions"| RAG
Sentiment -->|"Generate Interpretation"| LLMService
Sentiment -->|"Classify Emotions"| Models
Topics -->|"Generate Labels"| LLMService
Topics -->|"Cluster Keywords"| Models
RAG -->|"Generate Answers"| LLMService
RAG <-->|"Semantic Search"| VectorDB
RAG -->|"Embed Queries"| Models
Topics -.->|"Load Data"| Speeches
VectorDB -.->|"Indexed From"| Speeches
style Client fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
style Frontend fill:#764ba2,stroke:#667eea,stroke-width:3px,color:#fff
style API fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
style NLP fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style Data fill:#e9ecef,stroke:#667eea,stroke-width:2px
style LLMService fill:#fff,stroke:#667eea,stroke-width:2px
Component Architecture¶
1. API Layer (src/speech_nlp/api/)¶
FastAPI application with modular route organization.
Responsibilities:
- HTTP request handling
- Input validation (Pydantic models)
- Error handling and logging
- CORS middleware
- Static file serving
- Dependency injection for services
Route Modules:
routes_chatbot.py- RAG question-answering endpointsroutes_nlp.py- Traditional NLP analysis endpointsroutes_health.py- Health checks and system statusdependencies.py- Service dependency injection
Endpoints:
/rag/ask- RAG question answering/rag/search- Semantic search/rag/stats- Collection statistics/rag/index- Document indexing/analyze/sentiment- Sentiment analysis/analyze/topics- AI-powered topic extraction with semantic clustering/analyze/ngrams- N-gram analysis/health- Health check
2. Sentiment Analysis (src/speech_nlp/services/analysis/sentiment.py)¶
AI-powered multi-model sentiment analysis with emotion detection and contextual interpretation.
Architecture:
- FinBERT Model: Financial/political sentiment classification (positive/negative/neutral)
- RoBERTa Emotion Model: Six-emotion detection (anger, joy, fear, sadness, surprise, disgust)
- LLM Integration: Contextual interpretation explaining WHY the models produced their results (supports Gemini, OpenAI, Anthropic)
Key Features:
- Three-class sentiment classification with confidence scores
- Six-emotion detection with individual probabilities
- AI-generated contextual interpretation (2-3 sentences, max 2000 tokens)
- Automatic text chunking for long documents
- Configurable via environment variables (model names, temperature, max tokens)
- Dark mode UI with enhanced visualization
Processing Flow:
%%{init: {'theme':'base'}}%%
flowchart LR
Input["๐ Raw Text"] --> Chunk["๐ Text Chunking<br/><small>510 tokens max</small>"]
Chunk --> Sentiment["๐ฏ FinBERT<br/><small>Sentiment Scores</small>"]
Chunk --> Emotion["๐ญ RoBERTa<br/><small>Emotion Scores</small>"]
Sentiment --> Aggregate["๐ Aggregate Results<br/><small>Average across chunks</small>"]
Emotion --> Aggregate
Aggregate --> LLM["๐ค LLM Provider<br/><small>Contextual Analysis<br/>2000 token limit</small>"]
LLM --> Output["โจ Complete Analysis<br/><small>Sentiment + Emotions<br/>+ AI Interpretation</small>"]
style Input fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style Chunk fill:#fff,stroke:#667eea,stroke-width:1px
style Sentiment fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
style Emotion fill:#fce4ec,stroke:#e91e63,stroke-width:2px
style Aggregate fill:#fff,stroke:#667eea,stroke-width:1px
style LLM fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
style Output fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
Response Schema:
{
"sentiment": "positive",
"confidence": 0.85,
"scores": {
"positive": 0.85,
"negative": 0.08,
"neutral": 0.07
},
"emotions": {
"joy": 0.62,
"anger": 0.15,
"neutral": 0.12,
"fear": 0.05,
"sadness": 0.04,
"surprise": 0.02
},
"contextual_sentiment": "The text expresses strong positive sentiment about economic achievements, with joy emerging from pride in policy success. However, underlying anger surfaces when discussing immigration, creating emotional complexity that explains the mixed sentiment profile.",
"num_chunks": 3
}
3. RAG Service (src/speech_nlp/services/rag/service.py)¶
Orchestrates the RAG pipeline, coordinating modular components for intelligent question answering.
Architecture: The RAG service now uses a modular design with dedicated components:
- Orchestration: Manages ChromaDB collection and coordinates components
- Delegation: Delegates to specialized services for search, confidence, entities, and loading
Components Used:
SearchEngine(fromservices/rag/search.py)RAGGuardrails(fromservices/rag/guardrails.py)QueryRewriter(fromservices/rag/rewriter.py)ConfidenceCalculator(fromservices/rag/confidence.py)EntityAnalyzer(fromservices/rag/entities.py)DocumentLoader(fromservices/rag/chunking.py)LLMProvider(fromservices/llm/) - Pluggable provider abstraction
4. LLM Service (src/speech_nlp/services/llm/)¶
Pluggable LLM provider abstraction with support for multiple AI models.
Architecture:
- Abstract Base Class (
base.py): Defines theLLMProviderinterface - Factory Pattern (
factory.py): Creates providers with lazy imports for optional dependencies - Provider Implementations:
gemini.py- Google Gemini (default, always available)openai.py- OpenAI GPT models (optional:uv sync --group llm-openai)anthropic.py- Anthropic Claude models (optional:uv sync --group llm-anthropic)
Features:
- Model-Agnostic Configuration: Single config interface (
LLM_API_KEY,LLM_MODEL_NAME) - Easy Provider Switching: Change providers via
LLM_PROVIDERenvironment variable - Optional Dependencies: Only install providers you need
- Type-Safe Interface: All providers implement the same
LLMProviderinterface - Context-Aware Prompting: Builds prompts with retrieved context
- Entity-Focused Generation: Emphasizes entity mentions when applicable
- Fallback Extraction: Returns context snippets if LLM fails
- Source Attribution: Tracks and cites source documents
- Error Handling: Graceful degradation with informative fallbacks
Provider Interface:
class LLMProvider(ABC):
@abstractmethod
def generate_content(
self,
prompt: str,
temperature: Optional[float] = None,
max_output_tokens: Optional[int] = None
) -> str:
"""Generate text based on the given prompt."""
pass
Usage Example:
from speech_nlp.services.llm import create_llm_provider
# Create provider based on LLM_PROVIDER env variable
llm = create_llm_provider()
# Generate content (works with any provider)
response = llm.generate_content(
prompt="Explain the economic policies mentioned...",
temperature=0.7,
max_output_tokens=2048
)
5. RAG Components (src/speech_nlp/services/rag/)¶
Modular, testable components for RAG functionality.
5.1 SearchEngine (search.py)¶
Hybrid search engine combining multiple retrieval strategies.
Features:
- Semantic Search: MPNet embeddings (768d) with cosine similarity
- BM25 Search: Keyword-based sparse retrieval
- Hybrid Search: Configurable weighting of semantic + BM25 scores
- Cross-encoder Reranking: Final precision optimization
- Deduplication: Removes duplicate results by ID
Search Modes:
semantic- Pure vector similarity searchhybrid- Combines semantic + BM25 (default weights: 0.7/0.3)reranking- Optional cross-encoder for top results
5.2 ConfidenceCalculator (confidence.py)¶
Multi-factor confidence scoring for RAG answers.
Confidence Factors (weighted):
- Retrieval Quality (40%): Average semantic similarity of results
- Consistency (25%): Score variance (low variance = high confidence)
- Coverage (20%): Normalized chunk count (more chunks = better coverage)
- Entity Coverage (15%): Percentage of results mentioning query entities
Confidence Levels:
- High: combined_score โฅ 0.7
- Medium: 0.4 โค combined_score < 0.7
- Low: combined_score < 0.4
Output:
- Confidence level (high/medium/low)
- Numeric confidence score (0-1)
- Detailed explanation
- Individual factor scores
5.3 EntityAnalyzer (entities.py)¶
Entity extraction and statistical analysis.
Capabilities:
- Entity Extraction: Identifies capitalized words (filtered for stopwords, question words)
- Mention Counting: Tracks entity mentions across corpus
- Speech Coverage: Identifies which documents mention each entity
- Sentiment Analysis: Average sentiment toward entity (optional)
- Co-occurrence Analysis: Most common words appearing near entity
- Corpus Percentage: Percentage of documents mentioning entity
Statistics Output:
{
"mention_count": 524,
"speech_count": 30,
"corpus_percentage": 85.7,
"speeches": ["file1.txt", "file2.txt", ...],
"sentiment": {
"average_score": -0.15,
"classification": "Neutral",
"sample_size": 50
},
"associations": ["people", "country", "great", ...]
}
5.4 DocumentLoader (chunking.py)¶
Smart document loading with semantic chunking and metadata extraction.
Features:
- Semantic Chunking: NLTK sentence tokenisation + embedding cosine similarity for topic boundary detection
- Fallback Splitting:
RecursiveCharacterTextSplitterfor groups that exceedchunk_size - Metadata Extraction: Parses speech filenames into structured location, date, and year fields
- Configurable: Strategy ("semantic" or "fixed"), breakpoint percentile, min chunk size
- Directory Loading: Batch loading from directories with progress tracking
Chunking Strategy:
chunking_strategy = "semantic" # or "fixed"
semantic_breakpoint_percentile = 90.0 # topic-shift threshold
semantic_min_chunk_size = 256 # merge small groups
chunk_size = 2048 # max chunk size (fallback)
chunk_overlap = 150 # overlap for fixed chunking
5.5 RAGGuardrails (guardrails.py)¶
Three-layer quality pipeline preventing hallucination and ensuring answer grounding.
Layers:
- Layer 1 (Pre-retrieval): Rejects empty or trivially short queries
- Layer 2 (Post-retrieval): Sigmoid-normalised cross-encoder relevance filtering
- Layer 3 (Post-generation): Token-overlap grounding verification between answer and context
5.6 QueryRewriter (rewriter.py)¶
LLM-powered query cleaning for improved search retrieval.
Features:
- Fixes typos, spelling mistakes, and grammar
- Expands abbreviations and acronyms
- Deterministic rewrites (temperature=0.0)
- Safety guards: error fallback, length rejection, empty passthrough
- Rewritten query drives search; original preserved for entity extraction and answer generation
6. Text Preprocessing (src/speech_nlp/utils/text.py)¶
Text cleaning and normalization utilities.
Functions:
- Stopword removal (NLTK)
- Tokenization
- Special character removal
- URL removal
- N-gram extraction
7. Utilities (src/speech_nlp/utils/)¶
Data loading and analysis helpers.
Modules:
io.py- Speech loading and dataset statisticsformatting.py- Response formatting utilitiestext.py- Text cleaning and tokenization
8. AI-Powered Topic Analysis (src/speech_nlp/services/analysis/topics.py)¶
Advanced topic extraction with semantic clustering and LLM-generated insights.
Features:
- Semantic Clustering: Groups related keywords using sentence embeddings (MPNet) and KMeans
- AI-Generated Labels: Uses LLM to create meaningful cluster labels (e.g., "Border Security" instead of just "wall")
- Contextual Snippets: Extracts text passages showing keywords in use with highlighting
- Topic Summaries: LLM-generated interpretation of main themes and patterns
- Smart Filtering: Excludes common verbs and low-relevance clusters (< 50% avg relevance)
Processing Pipeline:
%%{init: {'theme':'base'}}%%
flowchart TB
Text["๐ Input Text"] --> Extract["๐ Extract Keywords<br/><small>TF-IDF + Filtering</small>"]
Extract --> Embed["๐ฏ Generate Embeddings<br/><small>MPNet (768d)</small>"]
Embed --> Cluster["๐ Semantic Clustering<br/><small>KMeans Algorithm</small>"]
Cluster --> Label["๐ท๏ธ Generate Labels<br/><small>LLM Provider</small>"]
Label --> Snippets["๐ Extract Snippets<br/><small>Context Windows</small>"]
Snippets --> Summary["โจ Generate Summary<br/><small>LLM Provider</small>"]
Summary --> Output["๐ Final Output<br/><small>Clustered Topics<br/>+ Snippets + Summary</small>"]
style Text fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style Extract fill:#fff3e0,stroke:#ff9800,stroke-width:2px
style Embed fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
style Cluster fill:#e1f5fe,stroke:#03a9f4,stroke-width:2px
style Label fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
style Snippets fill:#fff,stroke:#667eea,stroke-width:1px
style Summary fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
style Output fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
Key Advantages:
- Groups synonyms and related concepts automatically (e.g., "economy", "jobs", "employment" โ "Economic Policy")
- Provides real-world context with highlighted examples
- Ranks by semantic relevance, not just frequency
- Offers human-readable interpretation via AI
- Filters out generic verbs and weak clusters
RAG Pipeline¶
Modular architecture for Retrieval-Augmented Generation.
%%{init: {'theme':'base'}}%%
flowchart TB
subgraph Orchestrator["๐ฏ RAG Service (Orchestrator)"]
RagService["RAGService<br/><small>Collection Management<br/>Component Coordination</small>"]
end
subgraph Indexing["๐ Indexing Pipeline"]
direction LR
Loader["๐ DocumentLoader<br/><small>Chunking & Metadata</small>"]
Embedder["๐ฏ Embedding Model<br/><small>all-mpnet-base-v2<br/>768 dimensions</small>"]
DB[("๐๏ธ ChromaDB<br/><small>Vector Store<br/>Persistent</small>")]
Loader --> Embedder
Embedder --> DB
end
subgraph Query["๐ Query Pipeline"]
direction TB
Guardrails["๐ก๏ธ RAGGuardrails<br/><small>Validation โ Relevance โ Grounding</small>"]
Rewriter["โ๏ธ QueryRewriter<br/><small>LLM Query Cleaning</small>"]
Search["๐ SearchEngine<br/><small>Hybrid Retrieval<br/>Semantic + BM25</small>"]
Entities["๐ท๏ธ EntityAnalyzer<br/><small>Extraction & Stats</small>"]
Confidence["๐ ConfidenceCalculator<br/><small>Multi-factor Scoring</small>"]
LLM["๐ค LLM Provider<br/><small>Answer Generation<br/>Gemini/OpenAI/Claude</small>"]
end
RagService -.->|"Index Documents"| Loader
RagService -->|"Validate"| Guardrails
RagService -->|"Clean Query"| Rewriter
RagService <-->|"Query"| Search
Search <-->|"Retrieve"| DB
RagService -->|"Extract Entities"| Entities
Entities <-->|"Analyze"| DB
RagService -->|"Calculate"| Confidence
RagService -->|"Generate Answer"| LLM
style Orchestrator fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
style Indexing fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style Query fill:#e9ecef,stroke:#667eea,stroke-width:2px
style RagService fill:#764ba2,stroke:#667eea,stroke-width:2px,color:#fff
style DB fill:#fff,stroke:#667eea,stroke-width:2px
style LLM fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
RAG Workflow Details¶
1. Indexing (One-time or on-demand):
# DocumentLoader handles chunking
1. Load documents from directory
2. Split into chunks:
- Semantic chunking (default): sentence embeddings + cosine
similarity to detect topic boundaries
- Fixed chunking (fallback): RecursiveCharacterTextSplitter
with chunk_size=2048, overlap=150
3. Extract metadata from filenames:
- location, date, year, chunk_index, total_chunks
4. Generate embeddings via ChromaDB
- Model: all-mpnet-base-v2 (sentence-transformers)
- Dimension: 768
5. Store in ChromaDB with metadata
2. Querying:
# Orchestrated by RAGService, delegated to components
1. Receive question from user
2. RAGGuardrails Layer 1: validate query (reject empty/trivial)
3. QueryRewriter cleans query via LLM:
- Fix typos, expand abbreviations
- Does NOT broaden scope or add synonyms
4. EntityAnalyzer extracts entities from original question
5. SearchEngine performs hybrid retrieval (using rewritten query):
a. Semantic search: cosine similarity on embeddings
b. BM25 search: keyword matching
c. Combine results with weights (0.7 semantic, 0.3 BM25)
d. Optional cross-encoder reranking
6. RAGGuardrails Layer 2: filter chunks by relevance
- Sigmoid-normalised cross-encoder scoring
- Remove chunks below relevance threshold
7. ConfidenceCalculator computes multi-factor score:
- Retrieval quality (40%): average semantic similarity
- Consistency (25%): low score variance
- Coverage (20%): normalized chunk count
- Entity coverage (15%): % chunks mentioning entities
8. EntityAnalyzer generates statistics (if entities found):
- Mention counts across corpus
- Speech coverage percentage
- Sentiment analysis (optional)
- Co-occurrence analysis
9. LLMProvider generates answer:
- Build context-aware prompt
- Include entity focus if applicable
- Fallback to context extraction if LLM fails
10. RAGGuardrails Layer 3: grounding verification
- Token-overlap check between answer and source context
- Flag ungrounded answers
11. Return complete response:
- Generated answer
- Confidence level + score + explanation
- Supporting context chunks
- Source attribution
- Entity statistics (if applicable)
3. Confidence Scoring:
Multi-factor calculation combining:
- Retrieval Quality (40%): Average semantic similarity (0-1)
- Consistency (25%): Score variance (low variance = high confidence)
- Coverage (20%): Number of supporting chunks (normalized)
- Entity Coverage (15%): % of chunks mentioning query entities
Confidence Levels:
- High: combined_score โฅ 0.7
- Medium: 0.4 โค combined_score < 0.7
- Low: combined_score < 0.4
Data Flow¶
Sentiment Analysis Flow¶
%%{init: {'theme':'base'}}%%
sequenceDiagram
autonumber
participant User as ๐ค User
participant UI as ๐จ Dark Mode UI
participant API as โก FastAPI
participant Service as ๐ SentimentAnalyzer
participant FinBERT as ๐ฏ FinBERT Model
participant RoBERTa as ๐ญ RoBERTa Model
participant LLM as ๐ค LLM Provider
User->>UI: Enter text for analysis
UI->>API: POST /analyze/sentiment
API->>Service: analyze_sentiment(text)
Service->>Service: Chunk text (510 tokens)
loop For each chunk
Service->>FinBERT: Classify sentiment
FinBERT-->>Service: Sentiment scores
Service->>RoBERTa: Classify emotions
RoBERTa-->>Service: Emotion scores
end
Service->>Service: Aggregate scores across chunks
Service->>LLM: Generate contextual interpretation<br/>(max 2000 tokens)
LLM-->>Service: AI interpretation
Service-->>API: Complete analysis results
API-->>UI: JSON Response
UI-->>User: Display with dark theme<br/>sentiment + emotions + interpretation
Note over Service,LLM: LLM receives:<br/>- Text sample (600 chars)<br/>- Sentiment scores<br/>- Emotion scores
Note over UI,User: Results shown in dark mode<br/>with enhanced visualization
RAG Question Answering Flow¶
%%{init: {'theme':'base'}}%%
sequenceDiagram
autonumber
participant User as ๐ค User
participant UI as ๐จ Dark Mode UI
participant API as โก FastAPI
participant RAG as ๐ RAG Service
participant Search as ๐ SearchEngine
participant DB as ๐๏ธ ChromaDB
participant Embed as ๐ฏ Embeddings
participant Conf as ๐ Confidence
participant Entity as ๐ท๏ธ EntityAnalyzer
participant LLM as ๐ค LLM Provider
User->>UI: Ask question
UI->>API: POST /rag/ask
API->>RAG: ask(question, top_k=5)
RAG->>Entity: Extract entities from question
Entity-->>RAG: Entity list
RAG->>Embed: Encode question
Embed-->>RAG: Query vector (768d)
RAG->>Search: hybrid_search(query_vector)
Search->>DB: Semantic search (cosine similarity)
Search->>DB: BM25 search (keyword matching)
DB-->>Search: Retrieved chunks
Search->>Search: Combine & rerank results
Search-->>RAG: Top-k chunks (deduplicated)
RAG->>Conf: calculate_confidence(results, entities)
Conf-->>RAG: Confidence score + explanation
opt Entity Statistics
RAG->>Entity: analyze_entities(entities, corpus)
Entity->>DB: Query entity mentions
Entity-->>RAG: Entity stats & sentiment
end
RAG->>LLM: generate_answer(question, context, entities)
LLM-->>RAG: Generated answer
RAG-->>API: Complete RAG response<br/>(answer + confidence + sources + entities)
API-->>UI: JSON Response
UI-->>User: Display answer with dark theme<br/>context + sources + confidence
Note over Search,DB: Hybrid Search:<br/>70% semantic + 30% BM25<br/>Optional cross-encoder reranking
Note over Conf: Multi-factor scoring:<br/>40% retrieval quality<br/>25% consistency<br/>20% coverage<br/>15% entity coverage
API Architecture¶
Request/Response Models (Pydantic)¶
# Input Models
TextInput
NGramRequest
RAGQueryRequest
RAGSearchRequest
# Response Models
SentimentResponse
TopicResponse
StatsResponse
RAGAnswerResponse
RAGStatsResponse
Middleware Stack¶
%%{init: {'theme':'base'}}%%
flowchart TB
Request["๐จ User Request<br/><small>HTTP/HTTPS</small>"]
CORS["๐ CORS Middleware<br/><small>Allow origins config</small>"]
Routing["๐ FastAPI Routing<br/><small>Endpoint matching</small>"]
Validation["โ
Pydantic Validation<br/><small>Request models</small>"]
Handler["โ๏ธ Endpoint Handler<br/><small>Business logic</small>"]
Services["๐ง Business Logic<br/><small>RAG, Sentiment, Topics</small>"]
Serialize["๐ฆ Response Serialization<br/><small>JSON encoding</small>"]
Response["๐ฌ HTTP Response<br/><small>200/4xx/5xx</small>"]
Request --> CORS
CORS --> Routing
Routing --> Validation
Validation --> Handler
Handler --> Services
Services --> Serialize
Serialize --> Response
style Request fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style CORS fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
style Routing fill:#fff3e0,stroke:#ff9800,stroke-width:2px
style Validation fill:#e1f5fe,stroke:#03a9f4,stroke-width:2px
style Handler fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
style Services fill:#667eea,stroke:#764ba2,stroke-width:2px,color:#fff
style Serialize fill:#fff,stroke:#667eea,stroke-width:1px
style Response fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
Error Handling Strategy¶
try:
# Business logic
except SpecificError:
# Handle known errors
raise HTTPException(status_code=4xx)
except Exception as e:
# Log unexpected errors
logger.error(f"Error: {e}")
raise HTTPException(status_code=500)
Deployment Architecture¶
Docker Multi-Stage Build¶
%%{init: {'theme':'base'}}%%
flowchart LR
subgraph Stage1["๐จ Stage 1: Builder"]
direction TB
UV["โก uv Package Manager<br/><small>Fast Python installer</small>"]
Deps["๐ฆ Install Dependencies<br/><small>Production + optional groups</small>"]
UV --> Deps
end
subgraph Stage2["๐ Stage 2: Runtime"]
direction TB
Slim["๐ Python 3.12-slim<br/><small>Minimal base image</small>"]
Copy["๐ Copy Dependencies<br/><small>From builder stage</small>"]
App["๐ Copy Application Code<br/><small>src/, data/</small>"]
Models["๐ฏ Download ML Models<br/><small>NLTK + Transformers</small>"]
User["๐ค Non-root User<br/><small>appuser (UID 1000)</small>"]
Slim --> Copy
Copy --> App
App --> Models
Models --> User
end
Deps -.->|"Python packages<br/>wheel files"| Copy
style Stage1 fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
style Stage2 fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
style UV fill:#fff,stroke:#2196f3,stroke-width:1px
style Deps fill:#fff,stroke:#2196f3,stroke-width:1px
style Models fill:#fff9c4,stroke:#fbc02d,stroke-width:2px
style User fill:#c8e6c9,stroke:#4caf50,stroke-width:2px
Deployment Options¶
Option 1: Render (via Docker Hub)¶
%%{init: {'theme':'base'}}%%
flowchart LR
GH["โ๏ธ GitHub Actions<br/><small>CI/CD Pipeline</small>"]
Build["๐จ Build Docker Image<br/><small>Multi-stage build</small>"]
Test["โ
Run Tests<br/><small>pytest + linting</small>"]
DH["๐ณ Docker Hub<br/><small>Public registry</small>"]
Render["๐ Render Platform<br/><small>Auto-deploy on push</small>"]
Users["๐ฅ End Users<br/><small>HTTPS access</small>"]
GH --> Build
Build --> Test
Test -->|"Push latest tag"| DH
DH -->|"Webhook trigger"| Render
Render -->|"Serve API"| Users
style GH fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style Build fill:#fff3e0,stroke:#ff9800,stroke-width:2px
style Test fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
style DH fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
style Render fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
style Users fill:#c8e6c9,stroke:#4caf50,stroke-width:2px
Flow:
- Push to
mainbranch - GitHub Actions builds Docker image
- Push to Docker Hub (
trump-speeches-nlp-chatbot:latest) - Render detects new image
- Render pulls and deploys
- Health check
/health
Option 2: Azure Web App¶
Via ACR (Recommended):
%%{init: {'theme':'base'}}%%
flowchart LR
GH["โ๏ธ GitHub Actions<br/><small>CI/CD Pipeline</small>"]
Build["๐จ Build Docker Image<br/><small>Multi-stage build</small>"]
ACR["โ๏ธ Azure Container Registry<br/><small>Private registry</small>"]
Azure["๐ Azure Web App<br/><small>Managed container hosting</small>"]
Users["๐ฅ End Users<br/><small>HTTPS access</small>"]
GH --> Build
Build -->|"Push to ACR"| ACR
ACR -->|"Deploy container"| Azure
Azure -->|"Serve API"| Users
style GH fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style Build fill:#fff3e0,stroke:#ff9800,stroke-width:2px
style ACR fill:#e1f5fe,stroke:#03a9f4,stroke-width:2px
style Azure fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
style Users fill:#c8e6c9,stroke:#4caf50,stroke-width:2px
Via Docker Hub (Alternative):
%%{init: {'theme':'base'}}%%
flowchart LR
GH["โ๏ธ GitHub Actions<br/><small>CI/CD Pipeline</small>"]
Build["๐จ Build Docker Image<br/><small>Multi-stage build</small>"]
DH["๐ณ Docker Hub<br/><small>Public registry</small>"]
Azure["๐ Azure Web App<br/><small>Managed container hosting</small>"]
Users["๐ฅ End Users<br/><small>HTTPS access</small>"]
GH --> Build
Build -->|"Push to Docker Hub"| DH
DH -->|"Deploy container"| Azure
Azure -->|"Serve API"| Users
style GH fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style Build fill:#fff3e0,stroke:#ff9800,stroke-width:2px
style DH fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
style Azure fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
style Users fill:#c8e6c9,stroke:#4caf50,stroke-width:2px
CI/CD Pipeline¶
%%{init: {'theme':'base'}}%%
flowchart TB
Push["๐ Push to main<br/><small>Git commit</small>"]
subgraph CI["โ
CI Workflow"]
direction TB
Tests["๐งช Unit Tests<br/><small>pytest โข 3.11/3.12/3.13</small>"]
Lint["๐ Code Quality<br/><small>flake8 โข black โข mypy</small>"]
end
subgraph Security["๐ Security Scan"]
direction TB
PipAudit["๐ pip-audit<br/><small>Dependency vulnerabilities</small>"]
Bandit["๐ก๏ธ bandit<br/><small>Security analysis</small>"]
end
subgraph Build["๐จ Build & Deploy"]
direction TB
Docker["๐ณ Build Docker Image<br/><small>Multi-stage โข Optimized</small>"]
DHPush["๐ค Push to Docker Hub<br/><small>Latest + versioned tags</small>"]
ACRPush["โ๏ธ Push to ACR<br/><small>Azure Container Registry</small>"]
end
subgraph Deploy["๐ Deployment"]
direction LR
RenderDeploy["๐ข Deploy to Render<br/><small>Auto-deploy via webhook</small>"]
AzureDeploy["๐ต Deploy to Azure<br/><small>Azure Web App</small>"]
end
Push --> CI
Push --> Security
Tests --> Docker
Lint --> Docker
PipAudit --> Docker
Bandit --> Docker
Docker --> DHPush
Docker --> ACRPush
DHPush --> RenderDeploy
ACRPush --> AzureDeploy
style Push fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style CI fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
style Security fill:#fff3e0,stroke:#ff9800,stroke-width:2px
style Build fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
style Deploy fill:#667eea,stroke:#764ba2,stroke-width:2px,color:#fff
LLM Provider Architecture¶
The system uses a pluggable LLM provider abstraction that allows switching between different AI models without changing application code.
Architecture Pattern¶
%%{init: {'theme':'base'}}%%
flowchart TB
Config["โ๏ธ Environment Config<br/><small>LLM_PROVIDER<br/>LLM_API_KEY<br/>LLM_MODEL_NAME</small>"]
Factory["๐ญ LLM Factory<br/><small>create_llm_provider()</small>"]
Base["๐ LLMProvider<br/><small>Abstract Interface</small>"]
subgraph Providers["๐ค LLM Providers"]
direction LR
Gemini["โ
Gemini<br/><small>Always Available<br/>Default</small>"]
OpenAI["๐ต OpenAI<br/><small>Optional<br/>--group llm-openai</small>"]
Anthropic["๐ฃ Anthropic<br/><small>Optional<br/>--group llm-anthropic</small>"]
end
subgraph Services["๐ง AI Services"]
direction TB
RAG["๐ RAG Service<br/><small>Answer generation</small>"]
Sentiment["๐ Sentiment Analysis<br/><small>Interpretation</small>"]
Topics["๐ท๏ธ Topic Analysis<br/><small>Summaries & labels</small>"]
end
Config --> Factory
Factory --> Base
Base -.->|"Implements"| Gemini
Base -.->|"Implements"| OpenAI
Base -.->|"Implements"| Anthropic
RAG --> Factory
Sentiment --> Factory
Topics --> Factory
style Config fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style Factory fill:#fff3e0,stroke:#ff9800,stroke-width:2px
style Base fill:#e1f5fe,stroke:#03a9f4,stroke-width:2px
style Providers fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
style Services fill:#667eea,stroke:#764ba2,stroke-width:2px,color:#fff
style Gemini fill:#c8e6c9,stroke:#4caf50,stroke-width:2px
style OpenAI fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
style Anthropic fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
Provider Interface¶
All LLM providers implement the same interface:
class LLMProvider(ABC):
"""Abstract base class for LLM providers."""
@abstractmethod
def generate_content(
self,
prompt: str,
temperature: Optional[float] = None,
max_output_tokens: Optional[int] = None
) -> str:
"""Generate text based on the given prompt."""
pass
Factory Pattern¶
The factory creates providers with lazy imports for optional dependencies:
def create_llm_provider() -> LLMProvider:
"""Create LLM provider based on configuration."""
provider = settings.llm_provider.lower()
if provider == "gemini":
return GeminiLLM()
elif provider == "openai":
if not OPENAI_AVAILABLE:
raise ImportError("Install: uv sync --group llm-openai")
return OpenAILLM()
elif provider == "anthropic":
if not ANTHROPIC_AVAILABLE:
raise ImportError("Install: uv sync --group llm-anthropic")
return AnthropicLLM()
Provider Implementations¶
Gemini Provider (Default):
- Always available (base dependency)
- Uses
google-generativeaipackage - Supports Gemini 1.5/2.0 models
- Provider-specific: Safety settings configuration
OpenAI Provider (Optional):
- Requires:
uv sync --group llm-openai - Uses
openaipackage - Supports GPT-3.5/GPT-4/GPT-4o models
- Provider-specific: None (pure API)
Anthropic Provider (Optional):
- Requires:
uv sync --group llm-anthropic - Uses
anthropicpackage - Supports Claude 3/3.5 models
- Provider-specific: None (pure API)
Model-Agnostic Configuration¶
All providers use the same configuration interface:
# Environment Variables
LLM_PROVIDER=gemini # gemini | openai | anthropic
LLM_API_KEY=your_api_key # Single key for active provider
LLM_MODEL_NAME=gemini-2.0-flash-exp
LLM_TEMPERATURE=0.7
LLM_MAX_OUTPUT_TOKENS=2048
Switching Providers¶
From Gemini to OpenAI:
# 1. Install OpenAI support
uv sync --group llm-openai
# 2. Update .env
LLM_PROVIDER=openai
LLM_API_KEY=sk-your_openai_key
LLM_MODEL_NAME=gpt-4o-mini
# 3. Restart application
uv run uvicorn speech_nlp.app:app --reload
From Gemini to Anthropic:
# 1. Install Anthropic support
uv sync --group llm-anthropic
# 2. Update .env
LLM_PROVIDER=anthropic
LLM_API_KEY=sk-ant-your_anthropic_key
LLM_MODEL_NAME=claude-3-5-sonnet-20241022
# 3. Restart application
uv run uvicorn speech_nlp.app:app --reload
Benefits¶
- Flexibility: Switch providers without code changes
- Cost Optimization: Choose cost-effective models per use case
- Vendor Independence: No lock-in to single LLM provider
- Easy Testing: Compare results across different models
- Graceful Fallbacks: Degrade to context extraction if LLM fails
- Minimal Dependencies: Only install providers you use
Technology Stack¶
Core Technologies¶
| Layer | Technology | Purpose |
|---|---|---|
| API Framework | FastAPI 0.116+ | High-performance async API |
| Web Server | Uvicorn | ASGI server |
| LLM Integration | Gemini/OpenAI/Claude | Pluggable LLM providers |
| ML Framework | PyTorch 2.5+ | Deep learning backend |
| NLP Library | Transformers 4.57+ | Pre-trained models |
| Text Processing | NLTK 3.9+ | Tokenization, stopwords |
| Vector DB | ChromaDB 0.5+ | Persistent embeddings storage |
| Embeddings | sentence-transformers 3.3+ | Semantic embeddings (MPNet) |
| Reranking | Cross-encoder | Precision optimization |
| Keyword Search | rank-bm25 | Sparse retrieval |
| RAG Framework | LangChain 0.3+ | Text splitting utilities |
Supporting Technologies¶
| Category | Technology | Version |
|---|---|---|
| Dependency Mgmt | uv | 0.9+ |
| Containerization | Docker | Latest |
| CI/CD | GitHub Actions | - |
| Testing | pytest | 8.3+ |
| Code Quality | black, flake8, isort | Latest |
| Security | pip-audit, bandit | Latest |
Testing Strategy:
- Unit Tests: Component-level testing for SearchEngine, ConfidenceCalculator, EntityAnalyzer, DocumentLoader
- Integration Tests: Full RAG pipeline testing
- Coverage: 65%+ overall, 90%+ for core RAG components
- Fixtures: Modular pytest fixtures for isolated component testing
Model Details¶
| Model | Task | Source | Size |
|---|---|---|---|
| Gemini 2.5 Flash | Answer Generation, Topic Summaries, Sentiment Interpretation | Google AI | API-based |
| GPT-4o/GPT-4o-mini | Answer Generation (Optional) | OpenAI | API-based |
| Claude 3.5 Sonnet | Answer Generation (Optional) | Anthropic | API-based |
| FinBERT | Sentiment Classification | ProsusAI/finbert | ~440MB |
| RoBERTa-Emotion | Emotion Detection | j-hartmann/emotion-english-distilroberta-base | ~330MB |
| all-mpnet-base-v2 | Embeddings (768d), Topic Clustering | sentence-transformers | ~420MB |
| ms-marco-MiniLM | Reranking | cross-encoder | ~80MB |
Scalability Considerations¶
Current Architecture¶
- Compute: Single-instance deployment
- Storage: Local filesystem + ChromaDB
- Concurrency: Async FastAPI (handles concurrent requests)
Scaling Strategies¶
1. Horizontal Scaling¶
%%{init: {'theme':'base'}}%%
flowchart TB
Users["๐ฅ Users<br/><small>Concurrent requests</small>"]
LB["โ๏ธ Load Balancer<br/><small>Nginx / Azure LB</small>"]
subgraph Instances["๐ API Instances"]
direction LR
API1["โก Instance 1<br/><small>FastAPI + Models</small>"]
API2["โก Instance 2<br/><small>FastAPI + Models</small>"]
API3["โก Instance 3<br/><small>FastAPI + Models</small>"]
end
subgraph SharedData["๐พ Shared Data Layer"]
direction TB
Cache["๐ด Redis Cache<br/><small>Query results<br/>Embeddings</small>"]
SharedDB[("๐๏ธ Shared Vector DB<br/><small>pgvector / Pinecone</small>")]
Storage["โ๏ธ Shared Storage<br/><small>S3 / Azure Blob<br/>Models + Data</small>"]
end
Users --> LB
LB --> API1
LB --> API2
LB --> API3
API1 <--> Cache
API2 <--> Cache
API3 <--> Cache
API1 <--> SharedDB
API2 <--> SharedDB
API3 <--> SharedDB
API1 -.->|"Load models"| Storage
API2 -.->|"Load models"| Storage
API3 -.->|"Load models"| Storage
style Users fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style LB fill:#fff3e0,stroke:#ff9800,stroke-width:2px
style Instances fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
style SharedData fill:#667eea,stroke:#764ba2,stroke-width:2px,color:#fff
style Cache fill:#ffebee,stroke:#f44336,stroke-width:2px
Required Changes:
- Replace ChromaDB with pgvector (Postgres) or Pinecone
- Use shared model storage (S3/Azure Blob)
- Add Redis for caching
2. Vertical Scaling¶
Current Requirements:
- RAM: ~2.5GB (models + API)
- CPU: 1-2 cores
- Storage: ~1.5GB (models + data)
Optimized for:
- RAM: 4-8GB for concurrent requests
- CPU: 4+ cores for parallel processing
- Storage: 5GB+ for larger datasets
3. Performance Optimizations¶
Already Implemented:
- Multi-stage Docker builds
- Model pre-loading on startup
- Async request handling
- Efficient text chunking
Future Improvements:
- Model quantization (reduce size)
- GPU acceleration (CUDA support)
- Response caching (Redis)
- CDN for static files
- Database connection pooling
- Background task queues (Celery)
Resource Usage¶
| Component | RAM | CPU | Storage |
|---|---|---|---|
| FastAPI | ~100MB | Low | - |
| FinBERT | ~1GB | Medium | 440MB |
| RoBERTa-Emotion | ~800MB | Medium | 330MB |
| all-mpnet-base-v2 | ~400MB | Low | 420MB |
| ms-marco-MiniLM | ~100MB | Low | 80MB |
| ChromaDB | ~100MB | Low | Variable |
| NLTK Data | ~50MB | Low | 50MB |
| Total | ~2.5GB | 1-2 cores | ~1.5GB |
Security Architecture¶
Current Security Measures¶
- Dependency Scanning: pip-audit (weekly)
- Code Analysis: bandit
- Input Validation: Pydantic models
- Non-root Container: User
appuser(UID 1000) - Health Checks:
/healthendpoint
Production Recommendations¶
- Authentication: Add API key validation
- Rate Limiting: Implement per-IP limits
- HTTPS: Use reverse proxy (Nginx)
- CORS: Restrict to specific origins
- Secrets Management: Use environment variables
- Logging: Centralized logging (ELK stack)
- Monitoring: Prometheus + Grafana
Monitoring & Observability¶
Recommended Metrics¶
Application Metrics:
- Request count (by endpoint)
- Response time (p50, p95, p99)
- Error rate (4xx, 5xx)
- Model inference time
System Metrics:
- CPU usage
- Memory usage
- Disk I/O
- Network I/O
Business Metrics:
- Total analyses performed
- Most used endpoints
- Average sentiment scores
- RAG query accuracy
Implementation Example (Prometheus)¶
from prometheus_client import Counter, Histogram
request_count = Counter('api_requests_total', 'Total requests', ['endpoint'])
request_duration = Histogram('api_request_duration_seconds', 'Request duration')
Future Architecture Enhancements¶
Recently Completed (November 2025)¶
- โ LLM Provider Abstraction: Pluggable architecture supporting Gemini, OpenAI, and Anthropic
- โ Model-Agnostic Configuration: Single config interface for all LLM providers
- โ Factory Pattern: Lazy imports with optional dependencies for clean provider switching
- โ Modular RAG Architecture: Separated RAG functionality into dedicated, testable components
- โ Component Testing: Achieved 65%+ test coverage with component-level unit tests
- โ Type Safety: Pydantic models for all RAG data structures
- โ Production Logging: Dual-format logging (JSON for cloud, colored for development)
1. Advanced RAG Features¶
- Query Caching: Redis layer for common questions
- Multi-modal: Support PDFs, images, audio transcripts
- Temporal Analysis: Sentiment trends over time
- Entity Relationships: Knowledge graph visualization
- Fine-tuned Embeddings: Domain-specific embedding models
2. Performance Optimizations¶
- Async Processing: Background tasks for entity analytics
- GPU Acceleration: CUDA support for faster inference
- Model Quantization: Reduce model sizes
- Response Streaming: WebSocket support for real-time answers
3. Enhanced NLP¶
- Proper NER: spaCy or Hugging Face transformers for entity extraction
- Text Summarization: Automatic speech summarization
- Topic Modeling: LDA or BERTopic for theme discovery
- Fact Extraction: Structured information extraction
4. Deployment & Scale¶
- Kubernetes: Container orchestration
- Auto-scaling: Based on request volume
- Multi-region: Global deployment
- CDN: Static asset delivery
Development Workflow¶
%%{init: {'theme':'base'}}%%
flowchart LR
Dev["๐ป Local Development<br/><small>uv venv + FastAPI</small>"]
Test["๐งช Testing<br/><small>pytest โข 65% coverage</small>"]
Lint["๐ Linting<br/><small>black โข flake8 โข mypy</small>"]
Git["๐ Git Push<br/><small>Commit to main</small>"]
CI["โ๏ธ GitHub Actions<br/><small>CI/CD pipeline</small>"]
Docker["๐ณ Docker Build<br/><small>Multi-stage โข Optimized</small>"]
Env["๐ Deploy<br/><small>Render / Azure</small>"]
Dev --> Test
Test --> Lint
Lint --> Git
Git --> CI
CI --> Docker
Docker --> Env
style Dev fill:#f0f3ff,stroke:#667eea,stroke-width:2px
style Test fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
style Lint fill:#fff3e0,stroke:#ff9800,stroke-width:2px
style Git fill:#e1f5fe,stroke:#03a9f4,stroke-width:2px
style CI fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px
style Docker fill:#e3f2fd,stroke:#2196f3,stroke-width:2px
style Env fill:#667eea,stroke:#764ba2,stroke-width:3px,color:#fff
Testing Strategy¶
Component-Level Testing¶
Each RAG component has dedicated unit tests ensuring isolation and reliability:
Test Files:
tests/test_search_engine.py- SearchEngine component tests (18 tests)tests/test_confidence.py- ConfidenceCalculator tests (11 tests)tests/test_entity_analyzer.py- EntityAnalyzer tests (20 tests)tests/test_document_loader.py- DocumentLoader tests (11 tests)tests/test_rag_integration.py- Full RAG pipeline integration tests (28 tests)
Coverage:
- Overall: 65%+
- Core RAG components: 90%+
- SearchEngine: 94%
- ConfidenceCalculator: 93%
- DocumentLoader: 93%
- EntityAnalyzer: 73%
Testing Approach:
- Unit Tests: Isolated component testing with mocked dependencies
- Integration Tests: Full pipeline testing with real ChromaDB
- Fixtures: Reusable pytest fixtures for component setup
- Parametrized Tests: Testing multiple scenarios efficiently
- Edge Cases: Empty collections, invalid inputs, boundary conditions
Continuous Integration¶
GitHub Actions workflow runs on every push:
- Python 3.11, 3.12, 3.13 matrix testing
- Unit tests with coverage reporting
- Integration tests (excluding model loading)
- Linting (flake8, black, isort)
- Type checking (mypy for select modules)
- Security scanning (bandit, pip-audit)
References¶
- FastAPI Documentation
- Transformers Documentation
- ChromaDB Documentation
- LangChain Documentation
- Docker Best Practices
- pytest Documentation
Last Updated: December 2025
Version: 0.3.0
Maintainer: Kristiyan Bonev