Logging Configuration¶

Centralized logging setup for development and production environments.

Overview¶

The project uses a centralized logging configuration (src/speech_nlp/config/logging.py) that provides:

JSON logging for production/cloud environments (Azure, Docker)
Colorized console logging for local development
Structured log filtering to suppress noisy third-party libraries
Consistent formatting across all modules

Quick Start¶

Basic Usage¶

import logging

# Get a logger for your module
logger = logging.getLogger(__name__)

# Log messages
logger.info("Starting RAG query processing")
logger.warning("Low confidence score detected")
logger.error("Failed to load embeddings", exc_info=True)

Application Configuration¶

The FastAPI app automatically configures logging based on settings:

# In your own scripts
from speech_nlp.config.settings import get_settings

settings = get_settings()
settings.setup_logging()  # Configures based on environment

Configuration Options¶

Log Levels¶

Level	When to Use
`DEBUG`	Detailed diagnostic information (variable values, flow control)
`INFO`	General operational messages (process started, completed, counts)
`WARNING`	Potentially problematic situations (missing optional files, degraded performance)
`ERROR`	Error events that might still allow the application to continue
`CRITICAL`	Severe errors causing premature termination

Output Formats¶

Development (Colorized)¶

configure_logging(level="INFO", use_json=False)

Output:

2025-12-08 14:32:15 | INFO     | speech_nlp.services.rag.service | Initializing RAG service with hybrid search
2025-12-08 14:32:16 | INFO     | speech_nlp.services.rag.service | Loaded 35 documents from corpus
2025-12-08 14:32:17 | WARNING  | speech_nlp.services.llm.gemini  | Rate limit approaching, adding delay
2025-12-08 14:32:18 | ERROR    | speech_nlp.services.analysis.sentiment | Failed to load emotion model: ModelNotFound

Features:

Color-coded log levels (Green=INFO, Yellow=WARNING, Red=ERROR)
Human-readable timestamps
Clear module names
Easy to scan visually

Production (JSON)¶

configure_logging(level="INFO", use_json=True)

Output:

{"timestamp": "2025-12-08 14:32:15", "level": "INFO", "logger": "speech_nlp.services.rag.service", "message": "Initializing RAG service", "module": "rag_service", "process": 12345, "thread": 67890}
{"timestamp": "2025-12-08 14:32:16", "level": "INFO", "logger": "speech_nlp.services.rag.service", "message": "Loaded 35 documents", "module": "rag_service", "process": 12345, "thread": 67890}

Features:

Structured JSON for log aggregation tools
Parseable by Azure Monitor, CloudWatch, Datadog, Loki
Includes metadata (process ID, thread ID, module)
Easy to query and filter

Advanced Usage¶

Custom Context Fields¶

Add custom fields to log entries:

import logging

logger = get_logger(__name__)

# Create a log record with extra fields
extra_fields = {
    "request_id": "abc-123",
    "user_id": "user_456",
    "correlation_id": "xyz-789"
}

# Log with context
record = logging.LogRecord(
    name=logger.name,
    level=logging.INFO,
    pathname="",
    lineno=0,
    msg="Processing user request",
    args=(),
    exc_info=None
)
record.extra_fields = extra_fields
logger.handle(record)

JSON Output:

{
    "timestamp": "2025-11-19 14:32:15",
    "level": "INFO",
    "message": "Processing user request",
    "request_id": "abc-123",
    "user_id": "user_456",
    "correlation_id": "xyz-789"
}

Exception Logging¶

Always include exception info for error logs:

try:
    process_data(file_path)
except Exception as e:
    logger.error(f"Failed to process {file_path}: {e}", exc_info=True)
    raise

Benefit: Full stack traces are captured in logs for debugging.

Suppressing Noisy Libraries¶

The logging configuration automatically suppresses verbose output from:

chromadb - Telemetry messages
sentence_transformers - Model loading details
transformers - Tokenizer warnings
httpx - HTTP request details

To suppress additional libraries:

import logging

# In your configure_logging() setup
logging.getLogger("some_noisy_library").setLevel(logging.ERROR)

Best Practices¶

✅ Do's¶

Use appropriate log levels:

logger.info("Processing 1,234 records")        # Normal operation
logger.warning("Using default value for X")     # Potential issue
logger.error("Failed to connect to database")   # Actual error

Include relevant context:

logger.info(f"Processed {count:,} records in {duration:.2f}s")
logger.error(f"File not found: {file_path}")

Use f-strings for formatting:

logger.info(f"User {user_id} completed action {action}")

Log at module boundaries:

def process_fusion(year, month):
    logger.info(f"Starting fusion for {year}-{month:02d}")
    # ... processing ...
    logger.info(f"Fusion complete: {total_records:,} records")

❌ Don'ts¶

Don't over-log in tight loops:

# BAD - logs 10,000 times
for i in range(10000):
    logger.debug(f"Processing item {i}")

# GOOD - logs once or periodically
logger.info(f"Processing {len(items):,} items")
for i, item in enumerate(items):
    if i % 1000 == 0:
        logger.debug(f"Progress: {i:,}/{len(items):,}")

Don't log sensitive data:

# BAD
logger.info(f"Password: {password}")

# GOOD
logger.info("Authentication successful")

Don't use print() statements:

# BAD
print("Processing started")

# GOOD
logger.info("Processing started")

Don't log before configuring:

# BAD - logger not configured yet
logger = get_logger(__name__)
logger.info("Starting...")
configure_logging()

# GOOD - configure first
configure_logging()
logger = get_logger(__name__)
logger.info("Starting...")

Migration from Old Logger¶

Changes Required¶

Old Code (logger.py):

from speech_nlp.utils.logger import setup_logger, get_logger

# Setup with file handler
logger = setup_logger(
    name="speech_nlp.module",
    level=logging.INFO,
    log_file="logs/module.log",
    console=True
)

New Code (logging.py):

from speech_nlp.config.logging import configure_logging, get_logger

# Configure once at app startup
configure_logging(level="INFO", use_json=False)

# Get logger in each module
logger = get_logger(__name__)

Key Differences¶

Feature	Old (`logger.py`)	New (`logging.py`)
Configuration	Per-module setup	Global configuration
Format	Console only	JSON or colorized
Filtering	Manual	Automatic for known libraries
Production Ready	No	Yes (JSON logging)
File Logging	Per-module files	Centralized (via handlers)

Deployment Considerations¶

Local Development¶

# Use colorized logging
configure_logging(level="DEBUG", use_json=False)

Docker/Kubernetes¶

# Use JSON logging for container logs
import os
use_json = os.getenv("LOG_FORMAT", "json") == "json"
configure_logging(level="INFO", use_json=use_json)

Azure/AWS¶

# Enable JSON logging for cloud log aggregation
configure_logging(level="INFO", use_json=True)

CI/CD Pipeline¶

# Use plain text for readable build logs
configure_logging(level="INFO", use_json=False)

Troubleshooting¶

Logs Not Appearing¶

Problem: No log output is shown.

Solution:

# Ensure configure_logging() is called before any get_logger()
configure_logging(level="INFO", use_json=False)
logger = get_logger(__name__)
logger.info("Test message")  # Should now appear

Too Much Log Output¶

Problem: Logs are overwhelming with debug messages.

Solution:

# Use INFO level instead of DEBUG
configure_logging(level="INFO", use_json=False)

JSON Logs Not Parsing¶

Problem: JSON logs are malformed in log aggregation tool.

Solution:

# Ensure use_json=True is set
configure_logging(level="INFO", use_json=True, include_uvicorn=False)

Duplicate Log Messages¶

Problem: Same log message appears multiple times.

Solution:

# Don't call configure_logging() multiple times
# Call it once in main() or __main__ block

if __name__ == "__main__":
    configure_logging(level="INFO", use_json=False)
    main()  # All modules will use this configuration

Testing Guide - Testing with proper logging
Configuration Reference - Full configuration options

Last Updated: November 19, 2025
Logging Module: src/speech_nlp/config/logging.py

Logging Configuration¶

Overview¶

Quick Start¶

Basic Usage¶

Application Configuration¶

Configuration Options¶

Log Levels¶

Output Formats¶

Development (Colorized)¶

Production (JSON)¶

Advanced Usage¶

Custom Context Fields¶

Exception Logging¶

Suppressing Noisy Libraries¶

Best Practices¶

✅ Do's¶

❌ Don'ts¶

Migration from Old Logger¶

Changes Required¶

Key Differences¶

Deployment Considerations¶

Local Development¶

Docker/Kubernetes¶

Azure/AWS¶

CI/CD Pipeline¶

Troubleshooting¶

Logs Not Appearing¶

Too Much Log Output¶

JSON Logs Not Parsing¶

Duplicate Log Messages¶

Related Documentation¶