TAlcohol / ist718_notes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lectures notes for Big Data Analytics (IST 718)

  • Professor: Daniel Acuna https://acuna.io
  • Scribers: Lizhen Liang and Yimin Xiao

Contents

  1. Introduction to Data Science Linear algebra, calculus, statistics; Python, Jupyter notebook
  2. Python Programming Numpy, Pandas, Matplotlib
  3. Introduction to Hadoop, MapReduce, and Apache Spark
  4. Introduction to Spark DataFrames and Spark ML
  5. A Statistical Perspective on Machine Learning Introduction to probability; maximum likelihood estimation; mean square error estimation; gradient descent
  6. Assessing Model Accuracy Confusion matrix, bias–variance tradeoff, model selection: training, validating, and testing
  7. Case 1: Sentiment Analysis of Twitter Supervised learning, logistic regression, regularized logistic regression, elastic net regularization, model interpretation
  8. Case 2: A recommendation system for courses Unsupervised learning, nearest neighbors, dimensionality reduction (Principal Component Analysis, PCA), clustering (k-means)
  9. Case 3: Predicting Credit Scores with Bagging and Boosting "wisdom of the crowd", bagging, random forests, gradient boosting, feature importance
  10. Case 4: Object Recognition with Deep Learning Neural networks, multilayer perceptron, backpropagation for MLP; Computation graph, stochastic and mini-batch gradient descent, loss function, model definition, convolutional and recurrent networks, other topics

About


Languages

Language:TeX 100.0%