Skip to content

Titanic Survival Analysis & Prediction - ML Project

Kaggle Competition Python License Code Style

Overview

This portfolio project demonstrates professional machine learning engineering practices through the classic Kaggle Titanic competition. It showcases how to build production-grade ML systems with clean code, comprehensive documentation, and deployment-ready architecture:

  • 🔬 Research & Development: Comprehensive Jupyter notebook with EDA, feature engineering, and model optimization
  • 🏗️ Production Architecture: Modular, scalable ML pipeline following software engineering best practices
  • 🌐 Web Deployment: Flask API with Docker containerization and cloud deployment
  • 📊 Model Explainability: SHAP analysis for interpretable predictions
  • Code Quality: Type hints, testing, CI/CD, and comprehensive logging


Project Highlights

Machine Learning Excellence

  • 8+ Model Comparison: Logistic Regression, Random Forest, XGBoost, CatBoost, SVM, and more
  • Hyperparameter Optimization: GridSearchCV with stratified K-fold cross-validation
  • Ensemble Methods: Voting classifiers for robust predictions
  • Advanced Feature Engineering: Domain-driven features (cabin analysis, title extraction, fare normalization)
  • Model Explainability: SHAP waterfall plots and feature importance analysis

Results: 85.3% accuracy with XGBoost (top 10% on Kaggle leaderboard)

Software Engineering

  • Modular Design: Clean separation of data, features, models, and API layers
  • Type Safety: Comprehensive type hints throughout codebase
  • Testing: Unit and integration tests with pytest
  • Logging: Azure-ready structured logging with JSON output
  • CI/CD: GitHub Actions for automated testing and deployment
  • Containerization: Optimized Docker images with multi-stage builds

Technology Stack

Core ML Libraries

  • scikit-learn - ML algorithms and preprocessing
  • XGBoost - Gradient boosting
  • CatBoost - Categorical boosting
  • pandas - Data manipulation
  • numpy - Numerical computing

Explainability & Visualization

  • SHAP - Model explainability
  • matplotlib - Plotting
  • seaborn - Statistical visualizations

Web & Deployment

  • Flask - Web framework
  • Docker - Containerization
  • Render/Azure - Cloud hosting
  • GitHub Actions - CI/CD

Getting Started

Choose your path:

Get running locally in 5 minutes:

# Clone repository
git clone https://github.com/JustaKris/Titanic-Machine-Learning-from-Disaster.git
cd Titanic-Machine-Learning-from-Disaster

# Install dependencies
uv sync

# Run web app
python app.py

Full Setup Guide

Dive into the analysis in the included Jupyter notebook:

# Install with notebook dependencies
uv sync --group notebooks

# Launch Jupyter
jupyter lab notebooks/Titanic-Machine-Learning-from-Disaster.ipynb

The notebook demonstrates advanced feature engineering, model training, and SHAP explanations.

Deploy with Docker:

# Build image
docker build -t titanic-survival-predictor .

# Run container
docker run -p 5000:5000 titanic-survival-predictor

Deployment Guide


Documentation Contents

Guides


Performance

Metric Value Notes
Accuracy 85.3% XGBoost (tuned)
F1 Score 84.3% Weighted average
API Latency <50ms Average prediction time
Docker Image ~1.2GB Optimized multi-stage build

Live Demo

Try the deployed application:

🔗 Titanic Survival Predictor on Render (free tier may take ~30s to wake from sleep)


Contributing

Contributions are welcome! Feel free to: - Open issues for bugs or feature requests - Submit pull requests - Improve documentation - Share feedback


License

This project is licensed under the MIT License.


Contact

Kristiyan Bonev


**⭐ If this project helps you, please star it on GitHub! ⭐**