Quick Start Guide¶
Get the Trump Speeches NLP Chatbot API running locally in minutes.
Prerequisites¶
- Python 3.11+ installed
- uv installed (install guide)
- Gemini API key (get one free) or other LLM provider key
Setup¶
- Install Dependencies
- Configure Environment
Create a .env file in the project root:
GEMINI_API_KEY=your_api_key_here
# Optional: Use different LLM provider
# LLM_PROVIDER=openai
# LLM_API_KEY=sk-your_openai_key
# LLM_MODEL_NAME=gpt-4o-mini
- Run the API
The API will automatically:
- Load configuration from
.env - Initialize logging (colored output in development)
- Load ML models (FinBERT ~440MB, RoBERTa ~330MB, MPNet ~420MB)
- Initialize LLM provider (Gemini by default)
- Load ChromaDB vector database with existing embeddings
- Start FastAPI server
Expected startup output:
2025-11-04 12:34:56 | INFO | speech_nlp.app | Application: Trump Speeches NLP Chatbot API v0.1.0
2025-11-04 12:34:56 | INFO | speech_nlp.app | Environment: development
2025-11-04 12:34:56 | INFO | speech_nlp.app | ✓ Sentiment analysis model loaded successfully
2025-11-04 12:34:57 | INFO | speech_nlp.app | ✓ LLM service initialized and tested successfully
2025-11-04 12:34:58 | INFO | speech_nlp.app | ✓ RAG service initialized with 1082 existing chunks
2025-11-04 12:34:58 | INFO | speech_nlp.app | Application startup complete
- Access the Application
Local Development (Recommended for Testing): - Web UI: http://localhost:8000 - API Docs: http://localhost:8000/docs - Health Check: http://localhost:8000/health
Azure Deployment (Live Demo): - Web UI: https://trump-speeches-nlp-chatbot.azurewebsites.net - API Docs: https://trump-speeches-nlp-chatbot.azurewebsites.net/docs
⚠️ Azure Cold Start Warning:
The Azure deployment uses Free Tier hosting with ~2GB of ML models. Expect: - Cold start: 1-5 minutes after inactivity - Loading strategy: Refresh the page every 30 seconds until successful - AI responses: 30s-2min for complex queries - Warmed up: Fast (2-5s) once active
For instant responses during development, use local setup above!
Running with Docker¶
Docker containers include all ML models pre-downloaded (~2GB) for fast, consistent startup.
Build and Run¶
# Build image with models baked in (one-time, ~5-10 min)
docker build -t trump-speeches-nlp-chatbot .
# Run container (starts instantly - models already cached in image)
docker run --rm -it -p 8000:8000 --env-file .env --name nlp-chatbot trump-speeches-nlp-chatbot
Note: The build downloads ~2GB of ML models and includes them in the image. This makes the image larger (~4-5GB) but ensures instant container startup with no runtime downloads.
Using Docker Compose (Recommended)¶
Docker Compose adds a persistent volume for model caching across rebuilds:
# Start with volume-based caching
docker-compose up
# First run downloads models (~3-4 min)
# Subsequent runs are instant even after rebuilds
The huggingface-cache volume persists models between image updates, so you only download once.
Testing the RAG System¶
Using the Web Interface¶
- Open http://localhost:8000
- Navigate to the "RAG Q&A" tab
- Ask a question like "What economic policies were discussed?"
- View the AI-generated answer with confidence scores and sources
Using curl¶
# Ask a question (RAG)
curl -X POST http://localhost:8000/rag/ask `
-H "Content-Type: application/json" `
-d '{"question": "What was said about the economy?", "top_k": 5}'
# Semantic search
curl -X POST http://localhost:8000/rag/search `
-H "Content-Type: application/json" `
-d '{"query": "immigration policy", "top_k": 5}'
# Get RAG statistics
curl http://localhost:8000/rag/stats
# Sentiment analysis (traditional NLP)
curl -X POST http://localhost:8000/analyze/sentiment `
-H "Content-Type: application/json" `
-d '{"text": "The economy is doing great!"}'
Using Python¶
import requests
# RAG Question Answering
response = requests.post(
"http://localhost:8000/rag/ask",
json={
"question": "What were the main themes in the 2020 speeches?",
"top_k": 5
}
)
result = response.json()
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']} ({result['confidence_score']:.2f})")
print(f"Sources: {', '.join(result['sources'])}")
# Traditional NLP - Sentiment
response = requests.post(
"http://localhost:8000/analyze/sentiment",
json={"text": "This is incredible! Best economy ever."}
)
print(response.json())
Troubleshooting¶
"RAG service not initialized"¶
The API auto-indexes documents on first startup. This takes ~30-60 seconds. Check the logs for progress:
Gemini API Errors¶
Ensure your .env file exists with a valid GEMINI_API_KEY. Get a free key at https://ai.google.dev/.
Model Download Taking Long¶
First run downloads ~2GB of models (FinBERT, RoBERTa, MPNet embeddings). Subsequent runs are fast.
Switching LLM Providers¶
See the FAQ for instructions on using OpenAI or Anthropic instead of Gemini.
Port Already in Use¶
Module Not Found¶
Ensure you're in the project root directory and have run uv sync.
Next Steps¶
- Try the interactive web interface at http://localhost:8000
- Explore API documentation at http://localhost:8000/docs
- Read the FAQ for common questions
- Check out RAG Features for implementation details
- Follow the Deployment Guide to deploy to production
- Read about RAG improvements in
docs/RAG_IMPROVEMENTS.md - Deploy to production with
docs/DEPLOYMENT.md