Logging Configuration¶
Centralized logging setup for development and production environments.
Overview¶
The project uses a centralized logging configuration (src/speech_nlp/config/logging.py) that provides:
- JSON logging for production/cloud environments (Azure, Docker)
- Colorized console logging for local development
- Structured log filtering to suppress noisy third-party libraries
- Consistent formatting across all modules
Quick Start¶
Basic Usage¶
import logging
# Get a logger for your module
logger = logging.getLogger(__name__)
# Log messages
logger.info("Starting RAG query processing")
logger.warning("Low confidence score detected")
logger.error("Failed to load embeddings", exc_info=True)
Application Configuration¶
The FastAPI app automatically configures logging based on settings:
# In your own scripts
from speech_nlp.config.settings import get_settings
settings = get_settings()
settings.setup_logging() # Configures based on environment
Configuration Options¶
Log Levels¶
| Level | When to Use |
|---|---|
DEBUG |
Detailed diagnostic information (variable values, flow control) |
INFO |
General operational messages (process started, completed, counts) |
WARNING |
Potentially problematic situations (missing optional files, degraded performance) |
ERROR |
Error events that might still allow the application to continue |
CRITICAL |
Severe errors causing premature termination |
Output Formats¶
Development (Colorized)¶
Output:
2025-12-08 14:32:15 | INFO | speech_nlp.services.rag.service | Initializing RAG service with hybrid search
2025-12-08 14:32:16 | INFO | speech_nlp.services.rag.service | Loaded 35 documents from corpus
2025-12-08 14:32:17 | WARNING | speech_nlp.services.llm.gemini | Rate limit approaching, adding delay
2025-12-08 14:32:18 | ERROR | speech_nlp.services.analysis.sentiment | Failed to load emotion model: ModelNotFound
Features:
- Color-coded log levels (Green=INFO, Yellow=WARNING, Red=ERROR)
- Human-readable timestamps
- Clear module names
- Easy to scan visually
Production (JSON)¶
Output:
{"timestamp": "2025-12-08 14:32:15", "level": "INFO", "logger": "speech_nlp.services.rag.service", "message": "Initializing RAG service", "module": "rag_service", "process": 12345, "thread": 67890}
{"timestamp": "2025-12-08 14:32:16", "level": "INFO", "logger": "speech_nlp.services.rag.service", "message": "Loaded 35 documents", "module": "rag_service", "process": 12345, "thread": 67890}
Features:
- Structured JSON for log aggregation tools
- Parseable by Azure Monitor, CloudWatch, Datadog, Loki
- Includes metadata (process ID, thread ID, module)
- Easy to query and filter
Advanced Usage¶
Custom Context Fields¶
Add custom fields to log entries:
import logging
logger = get_logger(__name__)
# Create a log record with extra fields
extra_fields = {
"request_id": "abc-123",
"user_id": "user_456",
"correlation_id": "xyz-789"
}
# Log with context
record = logging.LogRecord(
name=logger.name,
level=logging.INFO,
pathname="",
lineno=0,
msg="Processing user request",
args=(),
exc_info=None
)
record.extra_fields = extra_fields
logger.handle(record)
JSON Output:
{
"timestamp": "2025-11-19 14:32:15",
"level": "INFO",
"message": "Processing user request",
"request_id": "abc-123",
"user_id": "user_456",
"correlation_id": "xyz-789"
}
Exception Logging¶
Always include exception info for error logs:
try:
process_data(file_path)
except Exception as e:
logger.error(f"Failed to process {file_path}: {e}", exc_info=True)
raise
Benefit: Full stack traces are captured in logs for debugging.
Suppressing Noisy Libraries¶
The logging configuration automatically suppresses verbose output from:
chromadb- Telemetry messagessentence_transformers- Model loading detailstransformers- Tokenizer warningshttpx- HTTP request details
To suppress additional libraries:
import logging
# In your configure_logging() setup
logging.getLogger("some_noisy_library").setLevel(logging.ERROR)
Best Practices¶
✅ Do's¶
- Use appropriate log levels:
logger.info("Processing 1,234 records") # Normal operation
logger.warning("Using default value for X") # Potential issue
logger.error("Failed to connect to database") # Actual error
- Include relevant context:
logger.info(f"Processed {count:,} records in {duration:.2f}s")
logger.error(f"File not found: {file_path}")
- Use f-strings for formatting:
- Log at module boundaries:
def process_fusion(year, month):
logger.info(f"Starting fusion for {year}-{month:02d}")
# ... processing ...
logger.info(f"Fusion complete: {total_records:,} records")
❌ Don'ts¶
- Don't over-log in tight loops:
# BAD - logs 10,000 times
for i in range(10000):
logger.debug(f"Processing item {i}")
# GOOD - logs once or periodically
logger.info(f"Processing {len(items):,} items")
for i, item in enumerate(items):
if i % 1000 == 0:
logger.debug(f"Progress: {i:,}/{len(items):,}")
- Don't log sensitive data:
- Don't use print() statements:
- Don't log before configuring:
# BAD - logger not configured yet
logger = get_logger(__name__)
logger.info("Starting...")
configure_logging()
# GOOD - configure first
configure_logging()
logger = get_logger(__name__)
logger.info("Starting...")
Migration from Old Logger¶
Changes Required¶
Old Code (logger.py):
from speech_nlp.utils.logger import setup_logger, get_logger
# Setup with file handler
logger = setup_logger(
name="speech_nlp.module",
level=logging.INFO,
log_file="logs/module.log",
console=True
)
New Code (logging.py):
from speech_nlp.config.logging import configure_logging, get_logger
# Configure once at app startup
configure_logging(level="INFO", use_json=False)
# Get logger in each module
logger = get_logger(__name__)
Key Differences¶
| Feature | Old (logger.py) |
New (logging.py) |
|---|---|---|
| Configuration | Per-module setup | Global configuration |
| Format | Console only | JSON or colorized |
| Filtering | Manual | Automatic for known libraries |
| Production Ready | No | Yes (JSON logging) |
| File Logging | Per-module files | Centralized (via handlers) |
Deployment Considerations¶
Local Development¶
Docker/Kubernetes¶
# Use JSON logging for container logs
import os
use_json = os.getenv("LOG_FORMAT", "json") == "json"
configure_logging(level="INFO", use_json=use_json)
Azure/AWS¶
CI/CD Pipeline¶
Troubleshooting¶
Logs Not Appearing¶
Problem: No log output is shown.
Solution:
# Ensure configure_logging() is called before any get_logger()
configure_logging(level="INFO", use_json=False)
logger = get_logger(__name__)
logger.info("Test message") # Should now appear
Too Much Log Output¶
Problem: Logs are overwhelming with debug messages.
Solution:
JSON Logs Not Parsing¶
Problem: JSON logs are malformed in log aggregation tool.
Solution:
Duplicate Log Messages¶
Problem: Same log message appears multiple times.
Solution:
# Don't call configure_logging() multiple times
# Call it once in main() or __main__ block
if __name__ == "__main__":
configure_logging(level="INFO", use_json=False)
main() # All modules will use this configuration
Related Documentation¶
- Testing Guide - Testing with proper logging
- Configuration Reference - Full configuration options
Last Updated: November 19, 2025
Logging Module: src/speech_nlp/config/logging.py