mxdara / PCA_Housing_Occupancy_Estimation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PCA for Housing Occupancy Estimation Dataset

Occupancy Detection using PCA and Classification Algorithms

Abstract

Principal Component Analysis (PCA) stands as a cornerstone method in the realm of data science, especially when dealing with extensive datasets. Its primary objective is to reduce the dimensionality of large datasets, transforming correlated variables into a smaller set of uncorrelated ones. In this study, PCA is employed to the Occupancy detection dataset with a goal to discern the presence or absence of occupants within a space. We further explore the efficacy of three prominent classification algorithms—Extra Trees (ET), K-Nearest Neighbors (KNN), and Random Forest (RF)—on both the original dataset and its PCA-transformed counterpart. With each algorithm meticulously fine-tuned with optimal hyperparameters, we benchmark their performance utilizing various metrics, including the F1 score, confusion matrix, and ROC curves. Notably, ET emerged as a frontrunner, showcasing superior performance in comparison to its peers within the PyCaret library. To deepen our understanding and ensure model transparency, we delve into the model's interpretability through the lens of explainable AI, leveraging the nuances of Shapley values. In summation, our findings underscore the algorithms' adeptness in distinguishing between the two distinct occupancy classes, culminating in an F1-score nearing perfection.

Index Terms

  • PCA
  • Occupancy Detection
  • Extra Trees classifier
  • KNN
  • Random Forest

About


Languages

Language:Jupyter Notebook 100.0%