Berkeley Earth Climate Data Time Series Analysis
Time series prediction using python and scikit learn
Dependencies
This project uses Python and requires the following packages from pip:
- matplotlib
- numpy
- pandas
- sklearn
Preparing the feature sets
Download the data from Berkeley Earth and extract the "GlobalLandTemperaturesByCity.csv" file into the repository's directory. Then, use python3 make-feature-sets.py
to export the various different feature set files into a feature-sets
directory.
Running the machine learning models
You can then configure and run whichever code you want for each model (i.e. cross-validation, training, evaluation, etc.) in each file. These are generally configured by a string near the top of each model's file.
Here's what each file does:
lasso.py
Cross-validates the lasso regression penalty and trains and evaluates a lasso regression model against an "average of features" baseline predictorlinear.py
Trains and evaluates a linear regression model against an "average of features" baseline predictorload_feature_sets.py
Contains some useful functions used by the tree models for loading the feature data generated bymake_feature_sets.py
make_feature_sets.py
Reads the entire Berkeley Earth data set and writes each country and city's feature set files into afeature-sets
directoryridge.py
Cross-validates the ridge regression penalty and trains and evaluates a ridge regression model against an "average of features" baseline predictor