ML Project 1

General info

Repository containing the code for the Project 1 of the Machine Learning course at EPFL.

The team (SchroedingerCats) is composed by:

Edoardo Debenedetti (@dedeswim)
Mari Sofie Lerfaldet (@marisofie)
Davide Nanni (@DSAureli)

The project has been developed and tested with Python 3.6, and the packages used to get the project up and running are listed in requirements.txt, and can be installed with:

pip3 install --user --requirement requirements.txt

For visualization purposes in the feature selection and engineering phase, we also used matplotlib, seaborn, sklearn, and pandas, but they are not needed to run the models and the final training.

The training and the prediction on the provided test sets can be done running:

python3 run.py

Moreover, the data are supposed to be in the data folder (with respect to the run.py script), and are supposed to have the names train.csv and test.csv. It is possible to download the data we used from this page.

The output of the prediction can be found in the final-test.csv file, located in the same folder as run.py.

Project structure

The project is structured in the following way:

.
├── implementations.py: contains **all the implementations** required by the project
├── notes.md: general notes about the project development
├── README.md: this file :)
├── requirements.txt: contains the packages used to run the project
├── run.py: contains the **final code** to train the model
├── tests.ipynb: a notebook that contains the tests of the required implementations, that can be used as guide for usage
├── data: contains the datasets (.gitignore'd)
├── notebooks
│   ├── features_log.ipynb: contains our investigations about taking the logarithm of the features
│   ├── features_overview.ipynb: contains the exploratory data analysis phase
│   ├── logistic_regression.ipynb: contains out trials with logistic regression
│   └── ridge_regression.ipynb: contains our trials with ridge regression
└── src
    ├── helpers.py: some helper functions used by different modules
    ├── split.py: contains the function used to split the dataset into training and test sets
    ├── k_fold.py: contains the functions used for cross-validation
    ├── polynomials.py: contains the functions used to get the polynom
    ├── logistic: contains the functions used to train the logistic regression model
    │   ├── loss.py: contains the function to compute the loss
    │   ├── gradient.py: contains the function to compute the gradient
    │   ├── hessian.py: contains the function to compute the hessian
    │   ├── implementations.py: contains the **logistic regression** implementations required by the project
    │   └── sigmoid.py: contains the function to compute the sigmoid
    └── linear: contains the functions used to train the linear regression model
        ├── gradient.py: contains the function to compute the gradient
        ├── implementations.py: contains the **linear regression** implementations required by the project
        └── loss.py: contains the function to compute the loss function

dedeswim / ml_project_1

ML Project 1

General info

Project structure

About

Languages