The purpose of this project was to learn Python and experiment with scikit-learn and its Pipeline, FeatureUnion and other classes using Kaggle's Titanic competition. Achieving a high Kaggle score was not a goal (public leaderboard score achieved: 0.78947). Code uses Python 3.5 and scikit-learn 0.17.1. All data files are located in the data
subfolder
Run driver_fit.py
to fit the RandomForest model, which will be saved to subfolder model
Run driver_predict.py
to make test set predictions and prepare the Kaggle submission file. It reads the RandomForest model created by driver_fit.py
New features are created based partially on code discussed on this web site