jason-neal / MLD2019

Machine learning and Databases at CAUP/IA in 2019

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machine learning and Databases at CAUP/IA in 2019

We have started!

Course overview

This course is an advanced course at CAUP during March and April 2019. Lectures will take place on Mondays at 14:00 while practical classes will take place on Thursdays at 10:00. Both have duration 2 hours with a short break.

The aim of this course is to get a good practical grasp of machine learning. I will not spend a lot of time on algorithm details but more on how to use these in python and try to discuss what methods are useful for what type of scientific question/research goal.

March 4 - Managing data and simple regression
  • Covering git and SQL
  • Introducing machine learning through regression techniques.
March 11 - Visualisation and inference methods
  • Visualisation of data, do's and don't's
  • Classical inference
  • Bayesian inference
  • MCMC
March 18 - Density estimation and model choice
  • Estimating densities, parametric & non-parametric
  • Bias-variance trade-off
  • Cross-validation
  • Classification
March 25 - Dimensional reduction
  • Standardising data.
  • Principal Component Analysis
  • Manifold learning
April 8 - Ensemble methods, neural networks, deep learning
  • Local regression methods
  • Random forests and other boosting methods
  • Neural networks & deep learning

Literature for the course

I expect that you have read through these two documents:

  • A couple of Python & Topcat pointers. This is a very basic document and might not contain a lot of new stuff. It does have a couple of tasks to try out - the solution for these you can find in the [ProblemSets/0 - Pyton and Topcat](ProblemSets/0 - Pyton and Topcat) directory.

  • A reminder/intro to relevant math contains a summary of some basic facts from linear algebra and probability theory that are useful for this course.

Below you can find some books of use. The links from the titles get you to the Amazon page. If there are free versions of the books legally available online, I include a link as well.

-"Elements of Statistical Learning - Hastie et al, is a more advanced version of the Introduction to Statistical Learning with much the same authors. This is also freely available on the web.

Making a copy of the repository that you can edit

In this case you will want to fork the repository rather than just clone this. You can follow the instructions below (credit to Alexander Mechev for this) to create a fork of the repository:

Software you need for the course

The course will make use of python throughout, and for this you need a recent version of python installed. I use python 3 by default but will try to make all scripts compatible with python 2 and python 3. For python you will need (well, I recommend it at least) at least these libraries installed:

  • numpy - for numerical calculations
  • astropy - because we are astronomers
  • scipy - because we are scientists
  • sklearn - Machine learning libraries with full name scikit-learn.
  • matplotlib - plotting (you can use alternatives of course)
  • pandas - nice handling of data
  • seaborn - nice plots

(the last two are really "nice to have" but if you can install the others then these are easy).

Lecture 1 - links and information

The slides are available in the Lectures directory. You can find some files for creating tables in the ProblemSets/MakeTables directory.

About

Machine learning and Databases at CAUP/IA in 2019

License:GNU General Public License v3.0


Languages

Language:TeX 64.1%Language:Python 34.6%Language:TypeScript 1.4%