Titanic Survival Analysis & Prediction - ML Project
Overview¶
This portfolio project demonstrates professional machine learning engineering practices through the classic Kaggle Titanic competition. It showcases how to build production-grade ML systems with clean code, comprehensive documentation, and deployment-ready architecture:
- 🔬 Research & Development: Comprehensive Jupyter notebook with EDA, feature engineering, and model optimization
- 🏗️ Production Architecture: Modular, scalable ML pipeline following software engineering best practices
- 🌐 Web Deployment: Flask API with Docker containerization and cloud deployment
- 📊 Model Explainability: SHAP analysis for interpretable predictions
- ✅ Code Quality: Type hints, testing, CI/CD, and comprehensive logging
Quick Links¶
-
Quick Start
Get the project running locally in under 5 minutes
-
Deployment
Deploy to Docker, Render, or Azure with CI/CD pipelines
-
API Reference
Complete documentation of all modules and functions
-
Architecture
Understand the system design and ML pipeline
Project Highlights¶
Machine Learning Excellence¶
- 8+ Model Comparison: Logistic Regression, Random Forest, XGBoost, CatBoost, SVM, and more
- Hyperparameter Optimization: GridSearchCV with stratified K-fold cross-validation
- Ensemble Methods: Voting classifiers for robust predictions
- Advanced Feature Engineering: Domain-driven features (cabin analysis, title extraction, fare normalization)
- Model Explainability: SHAP waterfall plots and feature importance analysis
Results: 85.3% accuracy with XGBoost (top 10% on Kaggle leaderboard)
Software Engineering¶
- Modular Design: Clean separation of data, features, models, and API layers
- Type Safety: Comprehensive type hints throughout codebase
- Testing: Unit and integration tests with pytest
- Logging: Azure-ready structured logging with JSON output
- CI/CD: GitHub Actions for automated testing and deployment
- Containerization: Optimized Docker images with multi-stage builds
Technology Stack¶
Core ML Libraries¶
- scikit-learn - ML algorithms and preprocessing
- XGBoost - Gradient boosting
- CatBoost - Categorical boosting
- pandas - Data manipulation
- numpy - Numerical computing
Explainability & Visualization¶
- SHAP - Model explainability
- matplotlib - Plotting
- seaborn - Statistical visualizations
Web & Deployment¶
- Flask - Web framework
- Docker - Containerization
- Render/Azure - Cloud hosting
- GitHub Actions - CI/CD
Getting Started¶
Choose your path:
Get running locally in 5 minutes:
Dive into the analysis in the included Jupyter notebook:
# Install with notebook dependencies
uv sync --group notebooks
# Launch Jupyter
jupyter lab notebooks/Titanic-Machine-Learning-from-Disaster.ipynb
The notebook demonstrates advanced feature engineering, model training, and SHAP explanations.
Documentation Contents¶
Guides¶
- Quick Start - Get up and running
- Installation - Detailed setup instructions
- Testing - Run tests and code quality checks
- Deployment - Deploy to production
- Architecture - System design overview
- Configuration - Settings and options
- Advanced Features - Deep dive into feature engineering
- API Reference - Complete API reference
- Explainability - Model interpretation with SHAP
Performance¶
| Metric | Value | Notes |
|---|---|---|
| Accuracy | 85.3% | XGBoost (tuned) |
| F1 Score | 84.3% | Weighted average |
| API Latency | <50ms | Average prediction time |
| Docker Image | ~1.2GB | Optimized multi-stage build |
Live Demo¶
Try the deployed application:
🔗 Titanic Survival Predictor on Render (free tier may take ~30s to wake from sleep)
Contributing¶
Contributions are welcome! Feel free to: - Open issues for bugs or feature requests - Submit pull requests - Improve documentation - Share feedback
License¶
This project is licensed under the MIT License.
Contact¶
Kristiyan Bonev
- GitHub: @JustaKris
- Email: k.s.bonev@gmail.com