Titanic Survival Analysis & Prediction - ML Project

Overview¶

This portfolio project demonstrates professional machine learning engineering practices through the classic Kaggle Titanic competition. It showcases how to build production-grade ML systems with clean code, comprehensive documentation, and deployment-ready architecture:

🔬 Research & Development: Comprehensive Jupyter notebook with EDA, feature engineering, and model optimization
🏗️ Production Architecture: Modular, scalable ML pipeline following software engineering best practices
🌐 Web Deployment: Flask API with Docker containerization and cloud deployment
📊 Model Explainability: SHAP analysis for interpretable predictions
✅ Code Quality: Type hints, testing, CI/CD, and comprehensive logging

Quick Links¶

Quick Start

Get the project running locally in under 5 minutes

Installation Guide
Deployment

Deploy to Docker, Render, or Azure with CI/CD pipelines

Deployment Guide
API Reference

Complete documentation of all modules and functions

API Docs
Architecture

Understand the system design and ML pipeline

Architecture Guide

Project Highlights¶

Machine Learning Excellence¶

8+ Model Comparison: Logistic Regression, Random Forest, XGBoost, CatBoost, SVM, and more
Hyperparameter Optimization: GridSearchCV with stratified K-fold cross-validation
Ensemble Methods: Voting classifiers for robust predictions
Advanced Feature Engineering: Domain-driven features (cabin analysis, title extraction, fare normalization)
Model Explainability: SHAP waterfall plots and feature importance analysis

Results: 85.3% accuracy with XGBoost (top 10% on Kaggle leaderboard)

Software Engineering¶

Modular Design: Clean separation of data, features, models, and API layers
Type Safety: Comprehensive type hints throughout codebase
Testing: Unit and integration tests with pytest
Logging: Azure-ready structured logging with JSON output
CI/CD: GitHub Actions for automated testing and deployment
Containerization: Optimized Docker images with multi-stage builds

Technology Stack¶

Core ML Libraries¶

scikit-learn - ML algorithms and preprocessing
XGBoost - Gradient boosting
CatBoost - Categorical boosting
pandas - Data manipulation
numpy - Numerical computing

Explainability & Visualization¶

SHAP - Model explainability
matplotlib - Plotting
seaborn - Statistical visualizations

Web & Deployment¶

Flask - Web framework
Docker - Containerization
Render/Azure - Cloud hosting
GitHub Actions - CI/CD

Getting Started¶

Choose your path:

Quick StartAnalysis NotebookDocker Deployment

Get running locally in 5 minutes:

# Clone repository
git clone https://github.com/JustaKris/Titanic-Machine-Learning-from-Disaster.git
cd Titanic-Machine-Learning-from-Disaster

# Install dependencies
uv sync

# Run web app
python app.py

Full Setup Guide

Dive into the analysis in the included Jupyter notebook:

# Install with notebook dependencies
uv sync --group notebooks

# Launch Jupyter
jupyter lab notebooks/Titanic-Machine-Learning-from-Disaster.ipynb

The notebook demonstrates advanced feature engineering, model training, and SHAP explanations.

Deploy with Docker:

# Build image
docker build -t titanic-survival-predictor .

# Run container
docker run -p 5000:5000 titanic-survival-predictor

Deployment Guide

Documentation Contents¶

Guides¶

Quick Start - Get up and running
Installation - Detailed setup instructions
Testing - Run tests and code quality checks
Deployment - Deploy to production
Architecture - System design overview
Configuration - Settings and options
Advanced Features - Deep dive into feature engineering
API Reference - Complete API reference
Explainability - Model interpretation with SHAP

Performance¶

Metric	Value	Notes
Accuracy	85.3%	XGBoost (tuned)
F1 Score	84.3%	Weighted average
API Latency	<50ms	Average prediction time
Docker Image	~1.2GB	Optimized multi-stage build

Live Demo¶

Try the deployed application:

🔗 Titanic Survival Predictor on Render (free tier may take ~30s to wake from sleep)

Contributing¶

Contributions are welcome! Feel free to: - Open issues for bugs or feature requests - Submit pull requests - Improve documentation - Share feedback

License¶

This project is licensed under the MIT License.

Contact¶

Kristiyan Bonev

GitHub: @JustaKris
Email: k.s.bonev@gmail.com

**⭐ If this project helps you, please star it on GitHub! ⭐**