Tutorial: introduction to machine learning and neural networks, with an application to earth system modeling
figure-validation-loss-torch.R
- Summer school link: New Advances in Land Carbon Cycle Modeling.
- Download the textbook from Taylor and Francis web page.
- My slightly revised book chapter PDF (same content as chapter 36 of textbook, with slight revisions).
- My Video on youtube.
- My Slides PDF.
- My Quiz.
Prepared for the summer school, New Advances in Land Carbon Cycle Modeling (3rd year, 2020).
- Slides PDF
- Lecture 1: Introduction to machine learning and selected applications
- a supervised learning algorithm will input a labeled training data set, and output what?
- what are three advantages of supervised machine learning, relative to classical programming techniques?
- Lecture 2: Demonstrating overfitting in regression, introduction to R programming
- when splitting data into train/test sets, what is the purpose of the train set?
- when splitting data into train/test sets, what is the purpose of the test set?
- when splitting a train set into subtrain/validation sets, what is the purpose of the subtrain set?
- when splitting a train set into subtrain/validation sets, what is the purpose of the validation set?
- The goal of supervised machine learning is to get a function that yields highly accurate predictions with respect to what data?
- How can you tell if machine learning model predictions are overfitting?
- In 4-fold cross-validation, we randomly assign each observation a fold ID in what range of values?
- In 4-fold cross-validation for subtrain/validation splits, if the validation fold ID is 3, then what observations are used as the subtrain set?
- Machine learning data sets are usually stored in CSV data files, where rows are ___ and columns are ___ and ___ ?
- When using the nnet::nnet function in R to learn a neural network with a single hidden layer, what data set (all/train/test/subtrain/validation) should you pass as the second argument?
- When using the nnet::nnet function in R to learn a neural network with a single hidden layer, do LARGE or SMALL values of the maxit hyper-parameter result in overfitting?
- When using the nnet::nnet function in R to learn a neural network with a single hidden layer, do LARGE or SMALL values of the size hyper-parameter result in overfitting?
- Lecture 3: Intro to neural networks for image classification using R/keras
- In multi-class classification problems with keras in R, the output/label data must be stored in what data structure?
- To use image data as inputs, you need to use what data structure?
- When using the keras::fit function, what data set (all/train/test/subtrain/validation) should you pass as the first two arguments?
- Using keras::fit with validation_split=0.3 implies what percents of data allocated to subtrain and validation sets?
- When using 4-fold cross-validation for model evaluation, the test set is used for what?
- To determine if classification models have learned any non-trivial patterns in the data, they should be compared with a baseline which ignores all inputs/features and always predicts what value?
Supplementary youtube screencasts showing R command line use:
- Machine learning and data visualization basics
- Basic neural networks (R keras)
- Number of hidden units is a regularization parameter (R keras)
- Convolutional neural networks (R keras)
New figure-overfitting-cv-data.R makes lots of figures showing difference between train/test/subtrain/validation sets:
figure-overfitting.R also makes
Figure from revised to “subtrain” and “validation” sets.
figure-proda-cv-data.R computes geographic and random folds, plots map
it also has batchtools code that computes figure-proda-cv-data-test.csv
figure-proda-cv-data-multitask.R computes figure-proda-cv-data-multitask-test.csv
Those files are read by figure-proda-cv.R which makes
mnist.tex contains figure/captions not used in chapter.
figure-overfitting-paper.R makes
figure-fashion-mnist.R makes
PRODA Data from Feng Tao downloaded from Google Drive.
Based on Practice%20session/nau_training_proda/nn_clm_cen.py it seems that
- inputs Practice%20session/nau_training_proda/input_data/EnvInfo4NN_SoilGrids.mat
- outputs Practice%20session/nau_training_proda/input_data/ParaMean_V8.4.mat
figure-proda-inputs.R makes
slides.tex makes slides.pdf
figure-overfitting.R makes various figures that demonstrate overfitting, e.g.
figure-test-accuracy-data.R makes figure-test-accuracy-data.rds (4-fold cross-validation estimation of test error using three keras neural network models).
figure-test-accuracy.R plot the test accuracy in 4-fold cross-validation
download.R downloads data sets.
figure-validation-loss.R plots subtrain/validation loss for three neural network models.