ocpodariu / udacity-mlnd

Machine Learning Engineer Nanodegree projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machine Learning Engineer Nanodegree

Projects developed as part of Udacity's Machine Learning Engineer Nanodegree.

Projects

0. Titanic Survival Exploration

  • View Jupyter Notebook or Go to project directory
  • Explored the dataset to identify which features best predict a passenger's survival
  • Used those features to create decision functions to predict the survival of the passengers

1. Predicting Boston Housing Prices

  • View Jupyter Notebook or Go to project directory
  • Measured the linear correlation between features and selling price using Pearson's r
  • Observed the effect of different training and testing splits on model performance
  • Used learning and complexity curves to detect underfitting and overfitting
  • Combined Grid Search with cross-validation to find the optimal maximum depth for a decision tree regressor
  • Final model obtained an R^2 score of 0.77
  • Discussed the model's applicability in a real-world scenario

2. SMS Spam Classification

  • View Jupyter Notebook or Go to project directory
  • Transformed the SMS messages using the Bag-of-Words model
  • Generated features based on the frequency of each word
  • Classified SMS messages as spam or not spam with a Naive Bayes model

3. Finding Donors for CharityML

  • View Jupyter Notebook or Go to project directory
  • Evaluated the performance of different supervised algorithms in identifying individuals making more than $50,000
  • Preprocessed the data by scaling numerical features, one-hot encoding categorical features and applying logarithmic transformations on features with skewed distribution
  • Built a pipeline to quickly evaluate the performance of different algorithms
  • Analyzed AdaBoost's performance in relation to the maximum number of estimators
  • Identified the top 5 most important features and analyzed the effects of feature selection on AdaBoost's performance

4. Creating Customer Segments

  • View Jupyter Notebook or Go to project directory
  • Applied PCA to identify customer spending patterns and to reduce the dimensionality of the data
  • Compared K-means and Gaussian Mixture Model to decide which is more suitable for grouping customers into segments
  • Performed a silhouette analysis to determine optimal number of components for Gaussian Mixture Model
  • Designed an A/B test to measure the effect of a delivery service change on each customer segment
  • Trained a classifier on customer segment data to label new customers based on their estimated spendings

About

Machine Learning Engineer Nanodegree projects


Languages

Language:HTML 65.8%Language:Jupyter Notebook 33.9%Language:Python 0.4%