p-ai-org/p-workshops

This repository is forked and adapted from a four part workshop developed by AashitaK at Harvey Mudd College.

Introduction:

The workshop series is designed with a focus on the practical aspects of machine learning. We will be working in Python and using real-world datasets from Kaggle, the machine learning platform most suited for the “learn-by-doing” philosophy. The series is targeted towards complete beginners familiar with Python, but it is also designed adaptively so that you will be challenged even if you have some familiarity with machine learning tools.

Topics covered in the guided sessions and hands-on exercises:

Session 1: Setup: Python and Github

Installing software: python, jupyter, git
Navigation and other basic commands in the terminal
Working with the p-ai-org github
Python:
- variables
- data structures (numbers, lists, strings)
- control flow (if, while, for)
- functions
- importing packages

Session 2: Data analysis: numpy and pandas (tentative topic list! probably to be reduced)

Pandas dataframes as the data structure for datasets
Converting csv files to dataframes
Slicing and indexing dataframes using conditionals as well as iloc and loc methods.
Statistical summary and exploration of dataframes
Detecting and filling missing values in the dataframes
Regular expressions for data extraction
Feature engineering such as creating new features
Basic statistical plots using matplotlib and seaborn
Correlation among features
Basic operations such as dropping rows/columns, setting index, replacing values of a column using a dictionary, etc.
Split-apply-combine operations by grouping rows of a dataframe
Encoding categorical variables
Concatentating and merging dataframes
More operations such as sorting the rows, creating a dataframe from the scratch, etc.

Session 3: Model Building, Tuning and Validation using Scikit-learn (tentative topic list! definitely to be reduced)

Overfitting and underfitting of models
Regression algorithms
- Linear Regression
- Polynomial Regression
- Rigde Regression
- Lasso Regression
Model Validation
Tuning regularization paramter
Evaluation metrics for regression - R-squared and Root Mean-Squared Error (RMSE)
Normalization and scaling of features
Classification algorithms
- Logistic Regression
- Decision Trees
- k-Nearest Neighbors
- Support Vector Machines
- Random Forests
Evaluation metrics for classification
- Classification accuracy
- Confusion matrix
- Decision Threshold
- Precision and Recall
- F1 score
- Area Under ROC curve
Dimensionality reduction (Optional)
- Principal Component Analysis (PCA)
k-fold Cross-validation
Maximum Voting Classifiers

P-ai:

Andrew Chen, Alex Ker, Corrine Donnay, Chanha Kim, Hannah Zucker

Original Team:

Instructor: Aashita Kesarwani
TAs: Rex Asabor, Ben Langton and Qualan Woodard

p-ai-org / p-workshops

Introduction:

Topics covered in the guided sessions and hands-on exercises:

P-ai:

Original Team:

About

Languages