tdhock / 2020-yiqi-summer-school

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tutorial: introduction to machine learning and neural networks, with an application to earth system modeling

ASU Intro to Deep Learning in R talk

figure-validation-loss-torch.R

Prepared for the summer school 4th year, 2021

Tutorial lectures and questions

  • Slides PDF
  • Lecture 1: Introduction to machine learning and selected applications
    • a supervised learning algorithm will input a labeled training data set, and output what?
    • what are three advantages of supervised machine learning, relative to classical programming techniques?
  • Lecture 2: Demonstrating overfitting in regression, introduction to R programming
    • when splitting data into train/test sets, what is the purpose of the train set?
    • when splitting data into train/test sets, what is the purpose of the test set?
    • when splitting a train set into subtrain/validation sets, what is the purpose of the subtrain set?
    • when splitting a train set into subtrain/validation sets, what is the purpose of the validation set?
    • The goal of supervised machine learning is to get a function that yields highly accurate predictions with respect to what data?
    • How can you tell if machine learning model predictions are overfitting?
    • In 4-fold cross-validation, we randomly assign each observation a fold ID in what range of values?
    • In 4-fold cross-validation for subtrain/validation splits, if the validation fold ID is 3, then what observations are used as the subtrain set?
    • Machine learning data sets are usually stored in CSV data files, where rows are ___ and columns are ___ and ___ ?
    • When using the nnet::nnet function in R to learn a neural network with a single hidden layer, what data set (all/train/test/subtrain/validation) should you pass as the second argument?
    • When using the nnet::nnet function in R to learn a neural network with a single hidden layer, do LARGE or SMALL values of the maxit hyper-parameter result in overfitting?
    • When using the nnet::nnet function in R to learn a neural network with a single hidden layer, do LARGE or SMALL values of the size hyper-parameter result in overfitting?
  • Lecture 3: Intro to neural networks for image classification using R/keras
    • In multi-class classification problems with keras in R, the output/label data must be stored in what data structure?
    • To use image data as inputs, you need to use what data structure?
    • When using the keras::fit function, what data set (all/train/test/subtrain/validation) should you pass as the first two arguments?
    • Using keras::fit with validation_split=0.3 implies what percents of data allocated to subtrain and validation sets?
    • When using 4-fold cross-validation for model evaluation, the test set is used for what?
    • To determine if classification models have learned any non-trivial patterns in the data, they should be compared with a baseline which ignores all inputs/features and always predicts what value?

Supplementary youtube screencasts showing R command line use:

24 Feb 2021

New figure-overfitting-cv-data.R makes lots of figures showing difference between train/test/subtrain/validation sets:

figure-overfitting-cv-data-test-fold-1.png

figure-overfitting-cv-data-inner-folds-1.png

figure-overfitting-cv-data-inner-folds-1-1.png

figure-overfitting-cv-data-median-mse-1.png

figure-overfitting-cv-data-test-fold-1-pred.png

figure-overfitting-cv-data.png

figure-overfitting.R also makes

figure-overfitting-validation-only.png

Figure from https://raw.githubusercontent.com/mlr-org/mlr3book/main/bookdown/images/nested_resampling.png revised to “subtrain” and “validation” sets.

nested_resampling.png

29 Oct 2020

figure-proda-cv-data.R computes geographic and random folds, plots map

figure-proda-cv-data-map.png

it also has batchtools code that computes figure-proda-cv-data-test.csv

figure-proda-cv-data-multitask.R computes figure-proda-cv-data-multitask-test.csv

Those files are read by figure-proda-cv.R which makes

figure-proda-cv-some-out.png (selected for publication)

figure-proda-cv-all-out.png (all)

27 Oct 2020

mnist.tex contains figure/captions not used in chapter.

22 Oct 2020

figure-overfitting-paper.R makes

figure-overfitting-paper-loss.png

figure-overfitting-paper.png

21 Oct 2020

figure-fashion-mnist.R makes

figure-fashion-mnist-fashion.png and

figure-fashion-mnist-digits.png and

figure-fashion-mnist-one-example.png and

figure-fashion-mnist-fashion-design.png and

figure-fashion-mnist-digits-design.png

14 Aug 2020

PRODA Data from Feng Tao downloaded from Google Drive.

Based on Practice%20session/nau_training_proda/nn_clm_cen.py it seems that

figure-proda-inputs.R makes

figure-proda-inputs.png

8 July 2020

slides.tex makes slides.pdf

figure-overfitting.R makes various figures that demonstrate overfitting, e.g.

figure-overfitting-pred-units=200-maxit=1.png

figure-overfitting-pred-units=200-maxit=10.png

figure-overfitting-pred-units=200-maxit=10000.png

figure-overfitting-data-loss-200.png

4 July 2020

figure-test-accuracy-data.R makes figure-test-accuracy-data.rds (4-fold cross-validation estimation of test error using three keras neural network models).

figure-test-accuracy.R plot the test accuracy in 4-fold cross-validation

figure-test-accuracy-baseline.png

figure-test-accuracy.png

figure-test-accuracy-both.png

download.R downloads data sets.

figure-validation-loss.R plots subtrain/validation loss for three neural network models.

About


Languages

Language:TeX 66.4%Language:R 33.5%Language:Makefile 0.1%