mshumayl / ml-analytics-portfolio

A collection of various ML and analytics mini projects.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shumayl Asmawi - ML/Analytics Portfolio

To view this as a webpage, click here.

About me

My name is Shumayl Asmawi, and I am a Systems Engineer at HP Inc, where I develop data-driven software systems in the manufacturing sector.

This mini portfolio was created in 2021, and may not accurately reflect my current skills. You can visit my GitHub profile and my blog to view my latest works.

About the portfolio

The depth and scope of each project varies, they include (but not limited to):

  • Data cleaning
  • Data preprocessing
  • Exploratory data analysis
  • Feature engineering
  • Data visualization
  • ML/artificial neural network model training
  • Hyperparameter tuning
  • Predictive analytics
  • Model evaluation

Mini projects

Click any of the project title to visit the codes on GitHub. If GitHub fails to load any of the .ipynb files, you can use nbviewer to view the files by clicking here.

Traffic prediction

  • Implemented an ensemble of gradient boosting, random forests, and linear regression to predict traffic congestion.

Stochastic Modelling of Semiconductor QA Results Using Sensor Data

  • Implemented an XGBoost model to predict the pass/fail yield for in-house line testing of over 1500 production entities using signals from over 500 sensors and process measurement points.
  • Obtained a great predictive performance of 0.876 (ROC-AUC).

Receiver Operating Characteristic (ROC) Curve

Malaysia Vaccination Dashboard

Vaccination Dashboard

EDA and Prediction of Solar Power Output

  • Conducted an exploratory data analysis (EDA) to discover key analytics and diagnostics for a dataset of two solar power plants.
  • Constructed a simple linear regression model to predict the output of the solar power plants which is accurate up to around 700kW (root mean squared error).

Variance Between Actual Values and Predicted Values

Feature Correlation Matrix

AC Power Output Throughout the Day

Kaggle TPS June 2021 Competition

  • Designed a multilayer perceptron (MLP) neural network model and an XGBoost model to predict the category of 100,000 different products given over 70 different mystery features of over 200,000 existing products for Kaggle's June Tabular Playground Competition.
  • Scored 1.77852 (competition winner scored 1.74370, multiclass log loss - lower is better) with predictions made by the XGBoost model.

MLP Model Architecture

Predicting House Prices with Neural Networks using Keras and Tensorflow

  • Designed a multilayer perceptron (MLP) neural network model with tf.keras to predict the price of houses given information on location and various other aspects of over 20,000 houses with a great variance score of 0.767.

MLP Model Architecture

Coordinate Plot of House Prices

Variance Between Actual Values and Predicted Values

Correlation Between Living Space and House Price

Exploring Trends and Patterns of TED Talks (EDA)

  • Conducted a comprehensive exploratory data analysis with prompts from a lecture by Kevin Markham on a dataset of over 2000 recorded TED Talks to visualize underlying trends and patterns regarding the popularity, sentiments, and ratings of the TED events.

Views Analytics

Engagement Analytics

Sentiment Analytics

Number of TED Talks per Year

Predicting Classification on a Dataset of Unknown Features with K-Nearest Neighbors Algorithm

  • Implemented the K-Nearest Neighbors algorithm to predict the category of an object based on 10 mystery features of 1000 different objects with a precision of 83%.

KNN Decision Boundary

Predicting Loan Repayment with Decision Trees and Random Forest Algorithms

  • Trained a decision trees model and a random forest model to predict loan repayment given past information of an applicant. The decision trees model and the random forest model achieved 75% and 78% accuracies respectively.

Decision Tree Model

Predicting Flower Type with Support Vector Machine Algorithm

  • Implemented the Support Vector Machine (SVM) algorithm to predict between 3 different flower species given information on sepal and petal dimensions with an average accuracy of 95%.

SVM Model Performance

Multivariate Plot

Features Correlation

Predicting Ad Click with Logistic Regression Model

  • Created a logistic regression model that processed 1000 rows of user data to estimate whether or not a person will click an advertisement with a 93% accuracy.

Model Performance

Multivariate Plot

Features Correlation

About

A collection of various ML and analytics mini projects.


Languages

Language:Jupyter Notebook 85.3%Language:HTML 14.7%Language:Python 0.0%