# ML Project 1

## General info

Repository containing the code for the Project 1 of the Machine Learning course at EPFL.

The team (SchroedingerCats) is composed by:

- Edoardo Debenedetti (@dedeswim)
- Mari Sofie Lerfaldet (@marisofie)
- Davide Nanni (@DSAureli)

The project has been developed and tested with Python 3.6, and the packages used to get the project up and running are listed in requirements.txt, and can be installed with:

`pip3 install --user --requirement requirements.txt`

For visualization purposes in the feature selection and engineering phase, we also used `matplotlib`

, `seaborn`

, `sklearn`

, and `pandas`

, but they are not needed to run the models and the final training.

The training and the prediction on the provided test sets can be done running:

`python3 run.py`

Moreover, the data are supposed to be in the `data`

folder (with respect to the `run.py`

script), and are supposed to have the names `train.csv`

and `test.csv`

. It is possible to download the data we used from this page.

The output of the prediction can be found in the `final-test.csv`

file, located in the same folder as `run.py`

.

## Project structure

The project is structured in the following way:

```
.
├── implementations.py: contains **all the implementations** required by the project
├── notes.md: general notes about the project development
├── README.md: this file :)
├── requirements.txt: contains the packages used to run the project
├── run.py: contains the **final code** to train the model
├── tests.ipynb: a notebook that contains the tests of the required implementations, that can be used as guide for usage
├── data: contains the datasets (.gitignore'd)
├── notebooks
│ ├── features_log.ipynb: contains our investigations about taking the logarithm of the features
│ ├── features_overview.ipynb: contains the exploratory data analysis phase
│ ├── logistic_regression.ipynb: contains out trials with logistic regression
│ └── ridge_regression.ipynb: contains our trials with ridge regression
└── src
├── helpers.py: some helper functions used by different modules
├── split.py: contains the function used to split the dataset into training and test sets
├── k_fold.py: contains the functions used for cross-validation
├── polynomials.py: contains the functions used to get the polynom
├── logistic: contains the functions used to train the logistic regression model
│ ├── loss.py: contains the function to compute the loss
│ ├── gradient.py: contains the function to compute the gradient
│ ├── hessian.py: contains the function to compute the hessian
│ ├── implementations.py: contains the **logistic regression** implementations required by the project
│ └── sigmoid.py: contains the function to compute the sigmoid
└── linear: contains the functions used to train the linear regression model
├── gradient.py: contains the function to compute the gradient
├── implementations.py: contains the **linear regression** implementations required by the project
└── loss.py: contains the function to compute the loss function
```