UA-ast502-2020 / classnotebook

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AST 502, Spring 2020, University of Arizona:

Data Mining and Machine Learning in Astronomy

Xiaohui Fan

Location

  • When: 3pm - 3:50pm, Monday and Wednesday
  • Where: Steward 208 (or N305 if needed)

Class Materials

Reference textbook

Ivezić, Connolly, VanderPlas & Gray: Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data (Princeton University Press, 2019)

Class Description

This is a graduate level elective course aiming at providing the interface between astronomical data analysis problems and modern statistics methods. Modern astronomy and astrophysics is undergoing a revolution with dramatic increases in both the volume and complexity of astronomical data. The last decade saw the emergence of many terabyte-level sky surveys across the electromagnetic spectrum; the next decade, data volumes will enter the petabyte regime, with an ever strong time domain component. These new data sets represent quantum leaps in our abilities for new astronomical discoveries, but also present significant challenges to standard analysis tools normally employed in astronomy.

The goal of this course is to bridge the gap between modern large data surveys and the data analysis tools that have been provided in normal graduate courses. The course will start with a brief review of the modern statistics framework relevant to large scale data analysis, including probabilities and statistical distribution, classical and Bayesian statistical inferences. Then it will cover the main topics of the course: data mining and machine learning, including density estimation, clustering analysis, dimensionality reduction, regression and model fitting, classification and time series analysis. Another key component of the course is to introduce commonly used data mining and machine learning tools, in the context of Python-based packages, which will be used in solving data problems throughout the course.

Class Schedule

  1. 0115 - Introduction
  2. 0122 - Tools
  3. 0127 - Statistics refresher
  4. 0129 - No class
  5. 0203 - Palmer: maximum likelihood estimate (4.2 - 4.5)
  6. 0205 - Zeljko Ivezic: LSST overview (guest lecture)
  7. 0210 - Chen: non-parametric modeling (4.8 - 4.9)
  8. 0212 - RS and Lo: Bayesian parameter estimation (5.3, 5.6)
  9. 0217 - Tang: Bayesian model selection (5.4, 5.7)
  10. 0219 - Peter Behroozi: Big Data and the Universe Machine (guest lecture)
  11. 0224 - Xu: MCMC (5.8)
  12. 0226 - Pearce and Rodozenski: PCA (7.1 - 7.3)
  13. 0302 - CK Chan: Big Data Challenges in EHT (guest lecture)
  14. 0304 - No class
  15. 0316 - Stephanie Juneau and Robert Nikutta: Science Platforms and Data Lab (guest lecture)
  16. 0318 - Stephanie Juneau and Robert Nikutta: Science Platforms and Data Lab (guest lecture)
  17. 0323 - Scott: Dimensionality: Manifold learning and ICA (7.5, 7.6)
  18. 0325 - Woodrum: Regression: linear models (8.1 - 8.5)
  19. 0330 - White: Regression: nonlinear models (8.7 - 8.10)
  20. 0401 - Tom Matheson: ANTARES (guest lecture)
  21. 0406 - Chamberlain: Classification: Generative (9.3)
  22. 0408 - Fan and Hayati: Classification: SVM (9.5, 9.6)
  23. 0413 - Liang: Classification: trees and forest (9.7)
  24. 0415 - Purdy: Deep learning and neural networks (9.8)
  25. 0420 - Jones: time series: basic models (10.1, 10.2)
  26. 0422 - Wolfe: time series: periodic (10.3)
  27. 0427 - Zhang: time series: localized and stochastic (10.4, 10.5)
  28. 0429 - Ann Zabludoff: Frontier Science with LSST (guest lecture)
  29. 0504 - project reports
  30. 0506 - project reports

The course will be given in a combination of instructor lectures, student-led seminars and guest lectures. After the first section of introductory material, students will lead discussion and demonstration of most of the topics, and guest lectures will introduce important current and future key big data projects in astronomy. The class will conclude with final projects on using data mining and machine learning tools of your own research data.

About


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%