KAR's repositories
Cucumber_Multi-Env_LatinSquare_Field_Experiment
A multi-environment Latin Square designed trial analysed by ANOVA, Two-way ANOVA, Fully Random Model, Mixed Effect Model, and Tukey test.
Maize_Soil_Nutrient_CRD_Glasshouse_Experiment-
A CRD system (8 treatments & 3 harvests) analysed by Shapiro-Wilk test, Q-Q plot, Levene’s test, Kruskal-Wallis test, and Dunn’s Post-hoc test.
Oats_Variety-Fertilizer_SplitPlot_Field_Experiment
A factorial Split-plot system analysed by Shapiro-Wilk test, Levene’s test, Q-Q plot, CI plot, Mixed-Effect Model, ANOVA, and Tukey test.
Bike-Share_Big_Data_Analysis
12 datasets, 3.7 million obs, & 13 vars were cleaned and manipulated for 6 graphs, dynamic map, and statistics to convert casual riders into members.
Brisbane_Real_Estate_Sales_2020
320k obs and 11 vars cleaned and manipulated for EDA and mapping (choropleth, cluster, points) to find a new home for a Brisbane family.
Houston_Avocado_Prices_EDA_-_Forecast
18k obs & 14 vars cleaned and manipulated for EDA, assumption tests, PP, WO, Ljung-Box, and forecasting (ETS & ARIMA) for avocado prices in the US and Houston.
Marketing_Analytics
Solved 9 biz tasks by 18 graphs and 10 statistical methods include dummy data partitioning (RMSE & R2), stepwise model selection, multicollinearity (correlation, VIF), MLR, GLM for logistic regression.
Recommendation_of_Crop_Classes_by_Predictive_Model
Built an ML API that recommends crop classes with 99.5% accuracy; Trained 13 models included Discriminants analyses, KNN, SVMs, Naive Bayers, Decision Tree, Random Forest (RF), and Boosted RF.
ResortHotel_versus_CityHotel
119k obs & 32 vars cleaned and manipulated to create 14 distinct graphs and statistic tables for an extensive EDA to draw insights.
Human-Resource-Data-Mining
5 analytical tasks have been completed using VAT validated gower-PAM clustering, Correspondence Analysis (CA), Asym-Biplot, Multiple Correspondence Analysis (MCA), Chi-Squared test, Regression, and predictive classification models with KNN, SVM, and Random Forest.
Life-Expectancy-Statistical-Analysis-WHO-
Statistically answered 8 research questions using Multiple Factor Analysis (MFA), Principal Component Analysis (PCA), Multiple Linear Regression, Welch's t-test, Wilcoxon signed-rank test, and Longitudinal Multilevel Mixed-effect Modeling with time trajectories.
Analysis-of-Titanic-Mortality
Data manipulation, imputation, feature engineering, and machine learning algorithms (K-Nearest neightbour, random forest, and extreme-gradient boosting) were applied to clean the dataset. A final, perfectly cleaned dataset was synthesised for data visualisation to understand the trend in the tragedy.
Credit-Card-Market-Segmentation
VEV model from Mclust among 5 clustering algorithms has optimal performance and detected 8 distinct groups of users. Data was cleaned, standardized and feature-selected, PCA’s biplot, Ggplot, Radar plots, and parallel coordinate plots were applied for EDA.
Dirty-Data-Challenge-
Clean, manipulate, transform, and join 4 messy datasets
Food-Poison-Survey-Analysis-using-Multiple-Correspondence-Analysis
This project applies multiple correspondence analysis (MCA) with the techniques in scree plot, variable plots, individual plots, biplot, cosine square (CO2) and contribution statistcs (contrib) to detect trends in the multivariate food poisoning survey dataset and identified the most probable food that caused the food poison. MCA is one of the principal component methods, and principal componet methods belong to the "unsupervised" machine learning branch.
Loan-EDA-and-Machine-Learning-Prediction
Solved 7 business tasks and identified statistical important variables related to loan application. Many plots were synthesised during EDA and machine learning. Models built include Logistic regression, Decision Tree, Bootstrap Aggregating, Random Forest, Fine tuned Extremely Gradient boosting.
Predicting-House-Prices-in-Boston_UniqueVersion
Extracted statistical relationships between house prices and many factors, applicationised the 90% R2 Random Forest model that outcompeted MLR, Lasso, PLS, KNN, and DT into production.
regression
regressionbook
Sales-of-Summer-Clothes-in-E-commerce-
Solve 9 analysis tasks and identified the most important variables in driving the success of clothes sales. Achieved via 22 plots, multiple linear regression and random forest
SimpleTalkDemo_R
Demo data and R script for Simple Talk aricle
superstore.sales
superstore.sales