Skip to content

Quick Start Guide

Get the Titanic Survival Prediction system running locally in under 5 minutes.

Prerequisites

Why uv?

uv is a blazing-fast Python package manager that replaces pip. It's 10-100x faster and handles dependencies more reliably. Learn more


Installation

1. Clone the Repository

git clone https://github.com/JustaKris/Titanic-Machine-Learning-from-Disaster.git
cd Titanic-Machine-Learning-from-Disaster

2. Install Dependencies

# Install all production dependencies
uv sync

# Or install with optional groups
uv sync --group dev        # Include development tools
uv sync --group notebooks  # Include Jupyter support
uv sync --all-groups       # Include everything

First-time setup

First run will download ~500MB of ML models and dependencies. Subsequent runs are instant.

3. Verify Installation

# Check Python environment
uv run python --version

# Verify key packages
uv run python -c "import pandas, sklearn, xgboost, catboost; print('✓ All packages imported successfully')"

Running the Application

python app.py

Then open your browser to http://localhost:5000

You'll see a form to enter passenger details: - Age - Gender - Passenger Class (1st, 2nd, 3rd) - Number of siblings/spouses - Number of parents/children - Port of embarkation - And more...

Click "Predict Survival" to get instant predictions with confidence scores!

# Install notebook dependencies (if not already installed)
uv sync --group notebooks

# Launch JupyterLab
uv run jupyter lab notebooks/Titanic-Machine-Learning-from-Disaster.ipynb

The notebook contains: - Complete exploratory data analysis (EDA) - Feature engineering walkthrough - Model comparison and tuning - SHAP explainability analysis - Visualizations and insights

from src.models.predict import PredictPipeline, CustomData

# Create input data
passenger = CustomData(
    age=25,
    sex='female',
    name_title='Miss',
    sibsp=1,
    pclass='1',
    embarked='C',
    cabin_multiple=0,
    parch=0
)

# Make prediction
pipeline = PredictPipeline()
predictions, probabilities = pipeline.predict(passenger.get_data_as_dataframe())

print(f"Survived: {bool(predictions[0])}")
print(f"Probability: {probabilities[0]:.2%}")

Training Your Own Model

Run the Full Training Pipeline

# Option 1: Using the scripts
python scripts/run_training.py

# Option 2: Step by step
python titanic_ml/data/loader.py           # Load and split data
python titanic_ml/features/build_features.py # Engineer features
python titanic_ml/data/transformer.py       # Create preprocessor
python titanic_ml/models/train.py           # Train models

Trained models are saved to models/ directory: - model.pkl - Best performing model - preprocessor.pkl - Feature transformer

Customize Training

Edit titanic_ml/config/settings.py to modify:

# Model training settings
CV_FOLDS = 5              # Cross-validation folds
RANDOM_STATE = 42         # Random seed for reproducibility

# Feature engineering
NUMERICAL_FEATURES = ['Age', 'SibSp', 'Parch', 'norm_fare', 'cabin_multiple']
CATEGORICAL_FEATURES = ['Pclass', 'Sex', 'Embarked', 'name_title']

Project Structure

Titanic-Machine-Learning-from-Disaster/
├── notebooks/                      # Jupyter notebooks
│   ├── Titanic-Machine-Learning-from-Disaster.ipynb
│   └── utils/                      # Notebook utilities
├── titanic_ml/                            # Source code
│   ├── config/                     # Configuration
│   ├── data/                       # Data loading
│   ├── features/                   # Feature engineering
│   ├── models/                     # Training & prediction
│   ├── app/                        # Flask web app
│   └── utils/                      # Utilities
├── tests/                          # Test suite
├── models/                         # Trained models
├── artifacts/                      # Processed data
├── static/                         # Web assets
├── templates/                      # HTML templates
└── docs/                           # Documentation

Testing

For comprehensive testing information, see the Testing Guide, which covers: - Running unit and integration tests - Code quality checks (Black, isort, flake8, mypy) - Coverage reporting - CI/CD validation

Quick test command:

uv run pytest --cov=src --cov-report=html


Using with Docker

Build the Image

docker build -t titanic-survival-predictor .

Run the Container

# Basic run
docker run -p 5000:5000 titanic-survival-predictor

# Run with volume for models
docker run -p 5000:5000 -v $(pwd)/models:/app/models titanic-survival-predictor

# Run in background
docker run -d -p 5000:5000 --name titanic-predictor titanic-survival-predictor

Stop and Remove

docker stop titanic-predictor
docker rm titanic-predictor

See the Deployment Guide for production deployment.


Common Issues

uv not found

Install uv first:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or with pip
pip install uv

ModuleNotFoundError

Ensure you're in the project directory and have installed dependencies:

cd Titanic-Machine-Learning-from-Disaster
uv sync

Port 5000 Already in Use

Change the port in app.py:

# Change this line
app.run(host='0.0.0.0', port=5000, debug=True)

# To this
app.run(host='0.0.0.0', port=8000, debug=True)

Model Files Not Found

Download pre-trained models or train your own:

# Train models
python scripts/run_training.py

# Models will be saved to models/ directory

Next Steps


Need Help?