Berkeley Earth Climate Data Time Series Analysis

Time series prediction using python and scikit learn

Dependencies

This project uses Python and requires the following packages from pip:

matplotlib
numpy
pandas
sklearn

Preparing the feature sets

Download the data from Berkeley Earth and extract the "GlobalLandTemperaturesByCity.csv" file into the repository's directory. Then, use python3 make-feature-sets.py to export the various different feature set files into a feature-sets directory.

Running the machine learning models

You can then configure and run whichever code you want for each model (i.e. cross-validation, training, evaluation, etc.) in each file. These are generally configured by a string near the top of each model's file.

Here's what each file does:

lasso.py Cross-validates the lasso regression penalty and trains and evaluates a lasso regression model against an "average of features" baseline predictor
linear.py Trains and evaluates a linear regression model against an "average of features" baseline predictor
load_feature_sets.py Contains some useful functions used by the tree models for loading the feature data generated by make_feature_sets.py
make_feature_sets.py Reads the entire Berkeley Earth data set and writes each country and city's feature set files into a feature-sets directory
ridge.py Cross-validates the ridge regression penalty and trains and evaluates a ridge regression model against an "average of features" baseline predictor

MystikHub / berkeley-time-series

Berkeley Earth Climate Data Time Series Analysis

Dependencies

Preparing the feature sets

Running the machine learning models

About

Languages