GA Data Science NYC #23
Team
- Daniel Demoray, producer
- Ruben Naeff, instructor (owner repo)
- Antoine Grant, expert in residence
- egroup & Slack channel, Google Drive folder, everyone
Please do not hesitate to contact any of us!
Logisitics
- June 25 - September 10, 2015
- Tuesdays and Thursdays 6.30-9.30
- GA West, 10 East 21st St, room 4A (4th floor)
- Office hours: TBD
Please fill out an exit ticket after each class
Please see Cloning the repo for how to download the latest course notes to your laptop.
Resources
I. Data Exploration (Analytics)
-
01: INTRODUCTION TO DATA SCIENCE
- Slides
- Setting Up Your Environment
- Data Science at the Command Line including exercises
-
- Slides
- SQL Exercises
- Python Exercises
- Data Exploration in Python including exercises
-
04: VISUALIZATIONS AND MORE DATA GATHERING
- Slides
- Web scraping optional demo
- Twitter API optional demo
- How To Present Your Insights
- Visualizations including exercises
- Anscombe's Quartet illustrating the need for visualizations
- Assignment #1: Data Exploration
II. Supervised Learning
- 05: INTRODUCTION TO MACHINE LEARNING
- Slides
- kNN Classification Iris dataset
- kNN implementation optional exercise
Regression models
-
- Presentations Assignment #1
- Slides
- Introduction to numpy optional
- Linear Algebra recap optional
- Linear Regression
statsmodels
,patsy
,seaborn
; salaries, house prices - 3D plot in Python example as reference
-
07: POLYNOMIAL REGRESSION & REGULARIZATION
- Slides
- Regularization polynomials,
makepipeline
, Ridge, Lasso
-
08: REGRESSION & TEXT PROCESSING
- One slide
- Text Processing Amazon movie reviews,
CountVectorizer
(demo) - Guest Speaker: Amy Roberts, CEO & Founder of Healthy Bytes
- Assignment #2: Linear Regression Salary Prediction
III. Supervised Learning: Classification
-
- Slides
- Logistic Regression Iris dataset, precision/recall, decision boundaries; exercises
- Insult Classification exercise
- Area Under the ROC Curve optional deep dive
- Non-linear decision boundaries optional
-
- Slides Exit tickets review
- Review Assignment #2 and leaderboard
- Final project announcement guidelines, deadlines, sample projects
- Guest Speaker: Rohit Acharya, Chief Data Scientist at First Access
-
- Slides
- 20 Newsgroups
CountVectorizer
,TfidfVectorizer
,MultinomialNB
(demo) - Naive Bayes implementation exercise
- Statistics & Probability recap optional basic recap
- Bayesian coin flips optional deep dive
-
12: ENSEMBLE LEARNING & RANDOM FORESTS
- Slides
- Random Forests in
sklearn
DecisionTree
,RandomForestClassifier
,AdaBoostClassifier
,GradientBoostingClassifier
- Drawing trees in
sklearn
optionalGraphviz
,pydot
- Plotting decision boundaries optional
ExtraTreesClassifier
,AdaBoost
- Random Forests implementation with notebook demonstration optional deep dive
-
- Slides
- Recognizing hand-written digits demo
- Plotting hyperplanes and support vectors demo
- Plotting different SVM kernels optional demo of non-linear kernels
- Separating mushrooms exercise
- Guest Speaker: Sandy Griffith, Biostatistician and Technical Lead at Flatiron Health
- Flask demonstration
-
- Slides Classification Review
- Comparison of all classifiers exercise
- Guest Speaker: Bob Filbin, Chief Data Scientist at Crisis Text Line
- Assignment #3: Classification Kaggle competition, data exploration and demo solution
IV. Unsupervised Learning
-
- Slides
- Clustering irises
sklearn
, simple demo - Clustering text
sklearn
, 20-newsgroups - Clustering tags in Stack Overflow Jaccard distance, exercise
- KMeans implementation: notebook and code exercise
-
- Presentations data explorations final project
- Slides
- PCA demo demo of the math
- SVD demo demo of the math
- Clustering House Legislatures demo PCA, polarizing politics
- Facial Recognition demo PCA, SVM and exercise
- Latent Semantic Analysis demo SVD, text clustering
-
- Slides
- Recommending Beers demo of several recommendation methods
- Who To Follow exercise item-based collaborative filtering
- Guest Speaker: George Kailas, CEO at Instadat
- Extracurricular: predicting student responses
V. Various
-
19: GUEST LECTURE - HASHING, A/B TESTING, CONJUGATE PRIORS
- Guest Lecturer: Robert Doherty, Lead Data Science at Outbrain
- Streaming Data Algorithms: Part 1 slides
- Bayesian A/B Headline Testing: Part 2 slides
- The Beta Distribution
- Instant Headline Testing
- Amazon Resellers LAB incl exercises
-
- Slides
- Videos: Neurons and the brain, Digit recognition and Autonomous driving by Andrew Ng
- Python implementation and demonstrating notebook optional deep dive
- Restriced Boltzmann Machines optional demo unsupervied neural nets in
sklearn
- Code Reviews work-in-progress
-
- Slides
- Multiprocessing in Python local parallel computing
- mrjob optional mapreduce framework in python
- Guest Speaker: Ethics by Monica Bulger, Researcher at Data & Society
-
23: FINAL PROJECT PRESENTATIONS