Tutorial: introduction to machine learning and neural networks, with an application to earth system modeling

ASU Intro to Deep Learning in R talk

figure-validation-loss-torch.R

Prepared for the summer school 4th year, 2021

Summer school link: New Advances in Land Carbon Cycle Modeling.
Download the textbook from Taylor and Francis web page.
- Officially published PDF of the part I wrote, Chapter 36: Intro to ML and Neural Nets.
My slightly revised book chapter PDF (same content as chapter 36 of textbook, with slight revisions).
My Video on youtube.
My Slides PDF.
My Quiz.

Prepared for the summer school, New Advances in Land Carbon Cycle Modeling (3rd year, 2020).

Tutorial lectures and questions

Slides PDF
Lecture 1: Introduction to machine learning and selected applications
- a supervised learning algorithm will input a labeled training data set, and output what?
- what are three advantages of supervised machine learning, relative to classical programming techniques?
Lecture 2: Demonstrating overfitting in regression, introduction to R programming
- when splitting data into train/test sets, what is the purpose of the train set?
- when splitting data into train/test sets, what is the purpose of the test set?
- when splitting a train set into subtrain/validation sets, what is the purpose of the subtrain set?
- when splitting a train set into subtrain/validation sets, what is the purpose of the validation set?
- The goal of supervised machine learning is to get a function that yields highly accurate predictions with respect to what data?
- How can you tell if machine learning model predictions are overfitting?
- In 4-fold cross-validation, we randomly assign each observation a fold ID in what range of values?
- In 4-fold cross-validation for subtrain/validation splits, if the validation fold ID is 3, then what observations are used as the subtrain set?
- Machine learning data sets are usually stored in CSV data files, where rows are ___ and columns are ___ and ___ ?
- When using the nnet::nnet function in R to learn a neural network with a single hidden layer, what data set (all/train/test/subtrain/validation) should you pass as the second argument?
- When using the nnet::nnet function in R to learn a neural network with a single hidden layer, do LARGE or SMALL values of the maxit hyper-parameter result in overfitting?
- When using the nnet::nnet function in R to learn a neural network with a single hidden layer, do LARGE or SMALL values of the size hyper-parameter result in overfitting?
Lecture 3: Intro to neural networks for image classification using R/keras
- In multi-class classification problems with keras in R, the output/label data must be stored in what data structure?
- To use image data as inputs, you need to use what data structure?
- When using the keras::fit function, what data set (all/train/test/subtrain/validation) should you pass as the first two arguments?
- Using keras::fit with validation_split=0.3 implies what percents of data allocated to subtrain and validation sets?
- When using 4-fold cross-validation for model evaluation, the test set is used for what?
- To determine if classification models have learned any non-trivial patterns in the data, they should be compared with a baseline which ignores all inputs/features and always predicts what value?

Supplementary youtube screencasts showing R command line use:

24 Feb 2021

New figure-overfitting-cv-data.R makes lots of figures showing difference between train/test/subtrain/validation sets:

figure-overfitting.R also makes

Figure from revised to “subtrain” and “validation” sets.

29 Oct 2020

figure-proda-cv-data.R computes geographic and random folds, plots map

it also has batchtools code that computes figure-proda-cv-data-test.csv

figure-proda-cv-data-multitask.R computes figure-proda-cv-data-multitask-test.csv

Those files are read by figure-proda-cv.R which makes

(selected for publication)

(all)

27 Oct 2020

mnist.tex contains figure/captions not used in chapter.

22 Oct 2020

figure-overfitting-paper.R makes

21 Oct 2020

figure-fashion-mnist.R makes

and

14 Aug 2020

PRODA Data from Feng Tao downloaded from Google Drive.

Based on Practice%20session/nau_training_proda/nn_clm_cen.py it seems that

figure-proda-inputs.R makes

8 July 2020

slides.tex makes slides.pdf

figure-overfitting.R makes various figures that demonstrate overfitting, e.g.

4 July 2020

figure-test-accuracy-data.R makes figure-test-accuracy-data.rds (4-fold cross-validation estimation of test error using three keras neural network models).

figure-test-accuracy.R plot the test accuracy in 4-fold cross-validation

download.R downloads data sets.

figure-validation-loss.R plots subtrain/validation loss for three neural network models.

About

Languages

Language:TeX 66.4%Language:R 33.5%Language:Makefile 0.1%