GlenCrawford / australia_rain_tomorrow_binary_classification_prediction

Binary classification model to predict whether or not it will rain tomorrow with a Tensorflow/Keras and scikit-learn neural network.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Binary classification machine learning model to predict whether it will rain tomorrow in Australia.

This is a neural network that uses binary classification to predict whether, given meteorological observations of a given day at a given weather station in Australia, it will rain there the next day. The model is trained and tested on a dataset containing about 10 years of daily weather observations from numerous Australian weather stations.

There are two separate implementations in this project: one using Tensorflow 2 and Keras, and another using scikit-learn.

The model currently has an accuracy of approximately 87%. Given that it doesn't rain exactly 50% of days, there are a lot more rows in the dataset where the target "RainTomorrow" column has a "No" value than "Yes". This means that you can make a complete guess and be right by random chance about 70% of the time. My goal was therefore to get the model accuracy to somewhere around 90%.

Here is the structure of the dataset used for training and testing, showing the header and two data rows:

Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
2010-10-20 Sydney 12.9 20.3 0.2 3 10.9 ENE 37 W E 11 26 70 57 1028.8 1025.6 3 1 16.9 19.8 No 0 No
2017-06-25 Brisbane 11 24.2 0 2.2 9.8 ENE 20 SSW NNE 2 7 68 53 1020.5 1017.3 6 3 15.9 22.6 No 0 Yes

The data was sourced from this Kaggle dataset compiled by Joe Young and Adam Young, which was in turn sourced from http://www.bom.gov.au/climate/data and http://www.bom.gov.au/climate/dwo/. This data is available under a Creative Commons (CC) Attribution 3.0 licence. For details on the meaning of each observation, see this page. Copyright Commonwealth of Australia, Bureau of Meteorology.

Requirements

  • Python (developed with version 3.7.4).

  • See dependencies.txt for packages and versions (and below to install).

Data preprocessing

Data preprocessing is done by a combination of Pandas (to drop NaN rows and map Yes/No strings into 1/0 binary integers), scikit-learn (to scale/normalize numeric features by calculating the z-score of each of their values), and Tensorflow to apply one-hot encoding to categorical features. The model's input layer is thus a combination of pre-normalized numeric features and one-hot encoded categorical features.

The following columns were skipped and not used as features for the model; all the rest were used:

  • Date: Not relevant.

  • RainToday: This is just a boolean representation of the numeric column "Rainfall". Experimented with adding this feature to the model, but had no effect on accuracy.

  • RISK_MM: This is the amount of rain for the following day. This was used to create the label/target column "RainTomorrow". This would be used if the model was doing regression, rather than classification.

  • RainTomorrow: Used as the training label/target.

The output of the model is just a single sigmoid-activation neuron which predicts target variable "RainTomorrow".

Setup

  • Clone the Git repository.

  • Install the dependencies:

pip install -r dependencies.txt

Run

python -W ignore tensor_flow.py

or

python -W ignore scikit_learn.py

Note that there is a current bug in TensorFlow where deprecation warnings are printed at the usage of feature columns, even though the new feature column API is indeed being used. It has been fixed and will be in a future release of TensorFlow. In the meantime, will just have to live with the warning output.

Monitoring/logging

After training, run:

$ tensorboard --logdir logs/fit
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.1.0 at http://localhost:6006/ (Press CTRL+C to quit)

Then open the above URL in your browser to view the model in TensorBoard.

About

Binary classification model to predict whether or not it will rain tomorrow with a Tensorflow/Keras and scikit-learn neural network.


Languages

Language:Python 100.0%