Keywords: H1N1 Pandemic, Classification, Machine Learning
This report presents an analysis of an H1N1 vaccination imbalanced dataset using various aspects of the participants’ backgrounds information to predict whether participants have received H1N1 flu vaccines. The machine learning classifiers employed include SVM, logistic regression, decision tree, random forest, neural network, and boosting. The study focuses on optimizing the tuning parameters of each model using K-fold cross-validation to enhance the ROC-AUC and PR-AUC scores. The findings reveal that most classifiers achieved an ROC-AUC score of approximately 0.89, with random forest, neural network, and boosting demonstrating a higher PR-AUC of 0.76. After evaluating performance metrics, interpretability, and time complexity, XGBoost, GBM, and random forest are deemed the most appropriate classifiers for this specific dataset.
MICE, Correlation map, Histogram, Feature Interactions, Chi-square Feature Selection, Min-max Scaler, SVM, Logistic Regression, Decision Tree, Random Forest, Neural Network, Gradient Boost, XGBoost