amirreza1997 / MachineLearning_Physics

This is to facilitate the “Machine Learning in Physics” course that I am teaching at Sharif University of Technology for winter-19 semester. For more information, see the course page at

Home Page:http://sharif.edu/~sraeisi/ML

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Machine Learning in Physics

This is to facilitate the “Machine Learning in Physics” course that I am teaching at Sharif University of Technology for winter-19 semester. For more information, see the course page at

http://sharif.edu/~sraeisi/ML

Requirements:

  • Decent understanding of programming and python and the following libraries

    • Numpy

    • Pandas

    • Plotting and graphical presentation tools in python

  • Git and Github (if you not familiar, let me know.)

  • Basic understanding of quantum mechanics and statistics.

  • Basic understanding of machine learning

Marking Scheme

This is a tentative plan and we may change it as we move on.

  • Course Project: 40%

  • Assignments: 30%

  • In-class exercises 10%

  • Final exam (set for Thursday, June 20th, 9AM): 30%

These add up to 110% which include the bonus as well.

Course Projects

This is a group project and counts towards 40% of the final grade.

The idea is that each group decides on a project at the beginning of the course and apply everything that we cover to their project. Here are some of the expectations for the course project:

  • Some initial proposal: Clear statement of the problem and some primary assessment of why using ML could help answer this problem. (Due Feb 28th)

  • Data collection/generation and preparation: (Due March 15th => Extended to March 20th )

    • Create a folder for this part
    • Have a description (readme file) for the data
    • Describe your data: Where it comes from, different feautres and their physical significance, your target value(s)
    • Create a notebook and implement the following in different sections:
      • Clean up the data (remove the missing data and convert everything to numerical values)
      • Scale your data
      • Analysis of features and target (Histograms and )
      • Feature selection (Try different techniques and assess how well they work on your data)
      • Feature extraction (Try different techniques and assess how well they work on your data)
  • Application of the basic ML techniques: (Due April 15th)

    • A table of assessment (Will give an example later.)

    • Investigation of variance and bias of the techniques investigated.

    • Learning and validation curves

  • Application of NN and setting the hyperparameters (Due April 30th)

  • Oral presentation (See me to set up the time, it should be before June 24th.)

  • Written term paper (It should be submitted by July 5th.)

Some notes:

  • Make sure you include citations to all the resources you use!
  • You should submit your work as a group rather than separate individual submissions.
  • Scripts, notebooks and figures without description would not count toward your grade.
  • Your codes should include enough comments and information that can be easily followed.
  • It is essential that all group members contribute (make commits) to their repositories, this is the only way I can make sure that everyone participated in their project.

Assignments

Assigment Deadline and Submission link Solutions
Assigment 1 Submit it here Solution 1
Assigment 2 Deadline: extended to March 22th Solution 2
Assigment 3 Deadline: April 18th
Assigment 4 Deadline: May 9th
Assigment 5 Deadline: May 26th

Reading Materials

  • Mehta, Pankaj, et al. "A high-bias, low-variance introduction to machine learning for physicists." Physics Reports (2019).

  • Nielsen, Michael A. Neural networks and deep learning. Vol. 25. San Francisco, CA, USA:: Determination press, 2015. (Available online )

  • Chollet, François, "Deep Learning with Python." (2018).

Table of contents

The course material is posted here and you can use either Google Colab or Mybinder to work with these Jupyter notebooks. Binder

Colab

Topic Contents of the Lectures Notebook(s)
Basics of machine learning Lecture 1: Introduction To Machine Learning
Notation
Regression, logistic regression and classification

Lecture 1: Noise
ML beyond simple examples
Colab


Colab
Basic Techniques Lecture 2: Basic Techniques
Overview of some of the most common techniques

Lecture 2: Kernels
Colab


Colab
Model Selection Lecture 3: Concepts from Statistical Learning
- Variance and bias
- Learning and Validation curves
Bayesian inference

Lecture 3: Model Complexity
- Practical model selection with scikit-learn

Lecture 3: Model Evaluation
-Confusion Matrix
- Recall, precision, f-score
- Precision-recall and ROC curve, ROC_AUC
Colab


Colab


Colab
Data Preparation Lecture 4: Data Preparation
- Standardization
- Clean-up: nan and outliers
- Feature Selection: Features Importance, variance threshold



Lecture 4: Data Reduction
Feautre Extraction: PCA, Manifold Learning
Colab




Colab
Ensemble Techniques Lecture 5: Ensemble Techniques
Aggregation, Stacking, Bagging, Boosting
Colab
Neural Networks Feedforward
- Model Geometry and formulation
- Universality
- Non-linearity: Activation function
Back propagation
- Details
- Initialization
- Optimizations
- Batch and epoch
- Couple of example
Practical implementations:
- TensorFlow and Keras
Colab
Colab
Colab
Colab
Colab
Convolutional Neural Networks - Basic Idea of Convolution
- Simple implementation of convnet with Keras
- Well-known models
Some examples
Colab
Colab
Colab
Colab
Recurrent Neural Network - Basic Idea
- Example
Note
Colab
Colab
Reinforcement Learning Basic Idea and details Example(s)

Cheat sheets and guides

See the files in the CheatSheet folder.

Item Description
Jupyter Jupyter provides an interactive environment for programming. We will be mostly using the python 3 kernel.
Git and Github Git provides a strong infrastructure for version control. Github is web-based hosting service for version control and it also provides services for collaboration.
Python It is the programming language that we will be mostly using for this course.
NumPy It’s a python library that provides strong and efficient tools for manipulation of high-dimensional arrays.
SciPy It’s a python library, built on NumPy for mathematical and scientific computing.
Pandas_basics Pandas 2
Importing data
It’s a python library, built on NumPy that provides efficient tools for handling and analysis of data.
Matplotlib
Seaborn
These are two of the most common python library for visualization.
Scikit-Learn It’s a python library that provides a nice and fairly efficient implementation of most the machine learning techniques and ideas.
Keras It is python library that provides a high-level and easy-to-use interface for Tensorflow and some other deep learning libraries.

About

This is to facilitate the “Machine Learning in Physics” course that I am teaching at Sharif University of Technology for winter-19 semester. For more information, see the course page at

http://sharif.edu/~sraeisi/ML


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%