KAR-NG

KAR's repositories

Cucumber_Multi-Env_LatinSquare_Field_Experiment

A multi-environment Latin Square designed trial analysed by ANOVA, Two-way ANOVA, Fully Random Model, Mixed Effect Model, and Tukey test.

Language:R2 10

Maize_Soil_Nutrient_CRD_Glasshouse_Experiment-

A CRD system (8 treatments & 3 harvests) analysed by Shapiro-Wilk test, Q-Q plot, Levene’s test, Kruskal-Wallis test, and Dunn’s Post-hoc test.

Language:R2 10

Oats_Variety-Fertilizer_SplitPlot_Field_Experiment

A factorial Split-plot system analysed by Shapiro-Wilk test, Levene’s test, Q-Q plot, CI plot, Mixed-Effect Model, ANOVA, and Tukey test.

Language:R2 10

Bike-Share_Big_Data_Analysis

12 datasets, 3.7 million obs, & 13 vars were cleaned and manipulated for 6 graphs, dynamic map, and statistics to convert casual riders into members.

Language:R1 10

Brisbane_Real_Estate_Sales_2020

320k obs and 11 vars cleaned and manipulated for EDA and mapping (choropleth, cluster, points) to find a new home for a Brisbane family.

Language:R1 10

Houston_Avocado_Prices_EDA_-_Forecast

18k obs & 14 vars cleaned and manipulated for EDA, assumption tests, PP, WO, Ljung-Box, and forecasting (ETS & ARIMA) for avocado prices in the US and Houston.

Language:R1 10

Solved 9 biz tasks by 18 graphs and 10 statistical methods include dummy data partitioning (RMSE & R2), stepwise model selection, multicollinearity (correlation, VIF), MLR, GLM for logistic regression.

Language:R1 10

Recommendation_of_Crop_Classes_by_Predictive_Model

Built an ML API that recommends crop classes with 99.5% accuracy; Trained 13 models included Discriminants analyses, KNN, SVMs, Naive Bayers, Decision Tree, Random Forest (RF), and Boosted RF.

Language:R1 10

ResortHotel_versus_CityHotel

119k obs & 32 vars cleaned and manipulated to create 14 distinct graphs and statistic tables for an extensive EDA to draw insights.

Language:R1 10

Human-Resource-Data-Mining

5 analytical tasks have been completed using VAT validated gower-PAM clustering, Correspondence Analysis (CA), Asym-Biplot, Multiple Correspondence Analysis (MCA), Chi-Squared test, Regression, and predictive classification models with KNN, SVM, and Random Forest.

Language:R010

Life-Expectancy-Statistical-Analysis-WHO-

Statistically answered 8 research questions using Multiple Factor Analysis (MFA), Principal Component Analysis (PCA), Multiple Linear Regression, Welch's t-test, Wilcoxon signed-rank test, and Longitudinal Multilevel Mixed-effect Modeling with time trajectories.

Language:HTML010

Analysis-of-Titanic-Mortality

Data manipulation, imputation, feature engineering, and machine learning algorithms (K-Nearest neightbour, random forest, and extreme-gradient boosting) were applied to clean the dataset. A final, perfectly cleaned dataset was synthesised for data visualisation to understand the trend in the tragedy.

Language:HTML010

Credit-Card-Market-Segmentation

VEV model from Mclust among 5 clustering algorithms has optimal performance and detected 8 distinct groups of users. Data was cleaned, standardized and feature-selected, PCA’s biplot, Ggplot, Radar plots, and parallel coordinate plots were applied for EDA.

Language:R010

Dirty-Data-Challenge-

Clean, manipulate, transform, and join 4 messy datasets

Language:R010

ecar

Language:HTML010

Food-Poison-Survey-Analysis-using-Multiple-Correspondence-Analysis

This project applies multiple correspondence analysis (MCA) with the techniques in scree plot, variable plots, individual plots, biplot, cosine square (CO2) and contribution statistcs (contrib) to detect trends in the multivariate food poisoning survey dataset and identified the most probable food that caused the food poison. MCA is one of the principal component methods, and principal componet methods belong to the "unsupervised" machine learning branch.

010

Loan-EDA-and-Machine-Learning-Prediction

Solved 7 business tasks and identified statistical important variables related to loan application. Many plots were synthesised during EDA and machine learning. Models built include Logistic regression, Decision Tree, Bootstrap Aggregating, Random Forest, Fine tuned Extremely Gradient boosting.

Language:R010

KAR-NG

KAR's repositories

Cucumber_Multi-Env_LatinSquare_Field_Experiment

Maize_Soil_Nutrient_CRD_Glasshouse_Experiment-

Oats_Variety-Fertilizer_SplitPlot_Field_Experiment

Bike-Share_Big_Data_Analysis

Brisbane_Real_Estate_Sales_2020

Houston_Avocado_Prices_EDA_-_Forecast

KAR-NG

Marketing_Analytics

Recommendation_of_Crop_Classes_by_Predictive_Model

ResortHotel_versus_CityHotel

Human-Resource-Data-Mining

Life-Expectancy-Statistical-Analysis-WHO-

Analysis-of-Titanic-Mortality

Credit-Card-Market-Segmentation

Dirty-Data-Challenge-

ecar

Food-Poison-Survey-Analysis-using-Multiple-Correspondence-Analysis

Loan-EDA-and-Machine-Learning-Prediction

nasa

pima

Predicting-House-Prices-in-Boston_UniqueVersion

regression

Sales-of-Summer-Clothes-in-E-commerce-

SimpleTalkDemo_R

soil

student

Student-Retention-Rate-of-AUS-Universities

superstore.sales