This project focuses on predicting the survival likelihood of passengers aboard the Titanic using machine learning techniques. The dataset includes information about passengers, such as their age, gender, class, and other features, and the goal is to build a model that can predict whether a passenger survived or not.
The primary tasks involved in the project are:
- Exploratory Data Analysis: Analyze and understand the dataset, handle missing values, and explore relationships between different features.
- Data Preprocessing & Feature Engineering: Prepare the data for training machine learning models by handling missing values, encoding categorical variables, and creating new features.
- Model Training: Train and evaluate machine learning models on the preprocessed data to predict passenger survival.
- Prediction Service: Create a Dockerized /predict endpoint.
- Batch Prediction: Use the trained model to predict survival for a batch of passengers in the test dataset..
-
Data Directory: The
data
directory contains the raw and processed datasets. -
Notebooks Directory: The
notebooks
directory includes Jupyter notebooks for different stages of the project, such as EDA, data preprocessing, and model training. -
Prediction Service Directory: The
prediction_service
directory contains code for deploying a prediction service using Flask. To run the prediction service:cd prediction_service python predict.py
This will start the Flask app, and you can make predictions by sending POST requests to the
/predict
endpoint. -
Scripts Directory: The
scripts
directory contains Python scripts for specific tasks, such as preprocessing data and batch prediction. To run the batch prediction script:cd scripts python batch_predict.py ../data/processed/test.csv
This will create a CSV file (
predictions.csv
) with predictions for each passenger in the test dataset.