ctbrown / CSX460

CSX460 Entire Class

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CSX460

This repository contains materials for Practical Machine Learning with R (CSX460) at the University of California, Berkeley. The most recent class is/was Spring 2015.

Course Description

This course provides an introduction to machine learning using R, the open source, statistical programming language. Once a niche set of tools for statisticians, programmers and quants, machine learning (sometimes also called data mining or statistical learning) has spread in popularity to a wide variety of applications and disciplines. This course teaches the fundamentals of machine learning without delving into the theory. The course will teach practical aspects of machine learning so that the students will be able to apply lessons to solve problems using machine learning in their own fields.

Course Learning Objectives

Students of this class will learn:

  • Fundamental concepts in ML
  • The differnece between supervised, unsupervised, semi-supervised, adaptive/reinforcement learning
  • The three prerequisites of ML algorithms/models
    • Loss function
    • Restricted class of functions
    • Search methodology for training
  • How to evaluate and compare ML model performance
  • How to pre-process data and build features
  • How to train ML models for prediction, categorization and recommendations
  • How to apply ML models on new data
  • How to use resampling techniques to calculate model performance
  • What the bootstrap is and how it works?
  • What Bagging is and how and why it improves model performance
  • What Boosting is and how and why it improves model performance
  • How to implement/deploy ML models for use by a wider audience
  • How to frame questions to be answered using ML techniques
  • Collaborate in a group using tools for collaborative/social programming
  • Generate high quality, graphical and textual results

Intended Audience

  • Anyone who wishes to learn the fundamentals of machine learning
  • Anyone who wants to learn about using R to build, evaluate or deploy machine learning models.
  • Scientists, engineers, business analysts, research who explore and analyze data and wish to present their findings in well-formatted textual and graphical forms. Anyone wishing to get hands-on experience building machine learning models.

Prerequisites

  • Experience programming in at least one high-level programming language such as BASIC, PASCAL, C, Java, Python, Perl, or Ruby.
  • Familiarity with R such as that gained through the Programming with R course.
  • Basic knowledge of statistics as covered in a first-semester undergraduate statistics course. There will be some coverage of basic statistical techniques as part of covering core elements of the Machine Learning.
  • Personal laptop for completing in class assignments.

Text/Required Reading

Reading Requirements for the Course

**Applied Predicative Modeling**  
ISBN-13: 978-1461468486 ISBN-10: 1461468485 
Kuhn, Max and Johnson, Kjell
Springer Science+Business
2013 

Google Group

There is an google group for this class: CSX460

Class Syllabus

Current Term: Spring 2016

This provides a session by session overview of CS-X460 (Practical Machine Learning).

1. Introduction to R, setting up the ML developers environment

  • Installing R/R Studio
  • Using git/Github
  • Installing packages from CRAN and Github
  • Overview of Maching Learning
  • Building First Models

Reading:

  • Chapters 1-2 of Applied Predictive Modeling

Exercise(s):

  • Finish in-class exercises

2. Fundamentals of Machine Learning

  • Supervised, unsupervised, and semi-supervised
  • Regression and classification
  • Measuring model error(s)
  • Machine learning prerequisites
  • Algorithm types
  • Data processing

Reading:

Exercise(s):

3. Linear Regression

  • OLS Regression
  • Data partitioning
  • Model evaluation and tuning
  • Exercises

Reading:

Exercise(s):

4. Logistic Regression

  • Logistic Regression
  • Exercises

Reading:

Exercise(s):

5. Advanced Techniques: Partitioning Methods

  • CART/Regression Trees
  • Clustering
  • K Nearest Neighbors
  • Exercises

Reading:

Exercise(s):

6. Advanced Techniques

  • Bagging
  • Bagged Trees / Random Forests
  • Exercises

Reading:

Exercise(s):

7. Advanced Techniques: Boosting

-Boosting

  • Neural Networks
  • Support Vector Machines
  • Exercises

Reading:

Exercise(s):

8. Deployment

  • Diving into the data lake
  • Optimization
  • Delivery and Production

Reading:

Exercise(s):

About

CSX460 Entire Class