Quick Start Guide¶

Get the Trump Speeches NLP Chatbot API running locally in minutes.

Prerequisites¶

Python 3.11+ installed
uv installed (install guide)
Gemini API key (get one free) or other LLM provider key

Setup¶

Install Dependencies

uv sync

# If you need a specific Python version:
# uv venv --python 3.12

Configure Environment

Create a .env file in the project root:

GEMINI_API_KEY=your_api_key_here

# Optional: Use different LLM provider
# LLM_PROVIDER=openai
# LLM_API_KEY=sk-your_openai_key
# LLM_MODEL_NAME=gpt-4o-mini

Run the API

uv run uvicorn speech_nlp.app:app --host 0.0.0.0 --port 8000 --reload

The API will automatically:

Load configuration from .env
Initialize logging (colored output in development)
Load ML models (FinBERT ~440MB, RoBERTa ~330MB, MPNet ~420MB)
Initialize LLM provider (Gemini by default)
Load ChromaDB vector database with existing embeddings
Start FastAPI server

Expected startup output:

2025-11-04 12:34:56 | INFO     | speech_nlp.app       | Application: Trump Speeches NLP Chatbot API v0.1.0
2025-11-04 12:34:56 | INFO     | speech_nlp.app       | Environment: development
2025-11-04 12:34:56 | INFO     | speech_nlp.app       | ✓ Sentiment analysis model loaded successfully
2025-11-04 12:34:57 | INFO     | speech_nlp.app       | ✓ LLM service initialized and tested successfully
2025-11-04 12:34:58 | INFO     | speech_nlp.app       | ✓ RAG service initialized with 1082 existing chunks
2025-11-04 12:34:58 | INFO     | speech_nlp.app       | Application startup complete

Access the Application

Local Development (Recommended for Testing): - Web UI: http://localhost:8000 - API Docs: http://localhost:8000/docs - Health Check: http://localhost:8000/health

Azure Deployment (Live Demo): - Web UI: https://trump-speeches-nlp-chatbot.azurewebsites.net - API Docs: https://trump-speeches-nlp-chatbot.azurewebsites.net/docs

⚠️ Azure Cold Start Warning:

The Azure deployment uses Free Tier hosting with ~2GB of ML models. Expect: - Cold start: 1-5 minutes after inactivity - Loading strategy: Refresh the page every 30 seconds until successful - AI responses: 30s-2min for complex queries - Warmed up: Fast (2-5s) once active

For instant responses during development, use local setup above!

Running with Docker¶

Docker containers include all ML models pre-downloaded (~2GB) for fast, consistent startup.

Build and Run¶

# Build image with models baked in (one-time, ~5-10 min)
docker build -t trump-speeches-nlp-chatbot .

# Run container (starts instantly - models already cached in image)
docker run --rm -it -p 8000:8000 --env-file .env --name nlp-chatbot trump-speeches-nlp-chatbot

Note: The build downloads ~2GB of ML models and includes them in the image. This makes the image larger (~4-5GB) but ensures instant container startup with no runtime downloads.

Using Docker Compose (Recommended)¶

Docker Compose adds a persistent volume for model caching across rebuilds:

# Start with volume-based caching
docker-compose up

# First run downloads models (~3-4 min)
# Subsequent runs are instant even after rebuilds

The huggingface-cache volume persists models between image updates, so you only download once.

Testing the RAG System¶

Using the Web Interface¶

Open http://localhost:8000
Navigate to the "RAG Q&A" tab
Ask a question like "What economic policies were discussed?"
View the AI-generated answer with confidence scores and sources

Using curl¶

# Ask a question (RAG)
curl -X POST http://localhost:8000/rag/ask `
  -H "Content-Type: application/json" `
  -d '{"question": "What was said about the economy?", "top_k": 5}'

# Semantic search
curl -X POST http://localhost:8000/rag/search `
  -H "Content-Type: application/json" `
  -d '{"query": "immigration policy", "top_k": 5}'

# Get RAG statistics
curl http://localhost:8000/rag/stats

# Sentiment analysis (traditional NLP)
curl -X POST http://localhost:8000/analyze/sentiment `
  -H "Content-Type: application/json" `
  -d '{"text": "The economy is doing great!"}'

Using Python¶

import requests

# RAG Question Answering
response = requests.post(
    "http://localhost:8000/rag/ask",
    json={
        "question": "What were the main themes in the 2020 speeches?",
        "top_k": 5
    }
)
result = response.json()
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']} ({result['confidence_score']:.2f})")
print(f"Sources: {', '.join(result['sources'])}")

# Traditional NLP - Sentiment
response = requests.post(
    "http://localhost:8000/analyze/sentiment",
    json={"text": "This is incredible! Best economy ever."}
)
print(response.json())

Troubleshooting¶

"RAG service not initialized"¶

The API auto-indexes documents on first startup. This takes ~30-60 seconds. Check the logs for progress:

INFO:     Loading documents into RAG service...
INFO:     Loaded 35 documents into RAG service!

Gemini API Errors¶

Ensure your .env file exists with a valid GEMINI_API_KEY. Get a free key at https://ai.google.dev/.

Model Download Taking Long¶

First run downloads ~2GB of models (FinBERT, RoBERTa, MPNet embeddings). Subsequent runs are fast.

Switching LLM Providers¶

See the FAQ for instructions on using OpenAI or Anthropic instead of Gemini.

Port Already in Use¶

uv run uvicorn speech_nlp.app:app --reload --port 8001

Module Not Found¶

Ensure you're in the project root directory and have run uv sync.

Next Steps¶

Try the interactive web interface at http://localhost:8000
Explore API documentation at http://localhost:8000/docs
Read the FAQ for common questions
Check out RAG Features for implementation details
Follow the Deployment Guide to deploy to production
Read about RAG improvements in docs/RAG_IMPROVEMENTS.md
Deploy to production with docs/DEPLOYMENT.md