In this project, we utilize health data to predict whether an individual is a smoker or non-smoker using machine learning algorithms. Our motivation revolves around deriving valuable insights from quantitative measurements, enabling informed health consultations, targeted interventions, and proactive treatments.
Link: https://www.kaggle.com/code/junaidullhassan/smoker-status-using-bio-signals-acc-79
- Python
- Jupyter Notebook
- Scikit-learn
- Seaborn
- NumPy
- Pandas
- Matplotlib
Smoking habits analysis, also known as tobacco consumption characterization, refers to evaluating health data to infer smoking status and intensity. Organizations and healthcare providers rely on such insights to drive personalized cessation programs, monitor recovery progress, and raise public awareness on the hazards of smoking.
- Data preparation and cleansing to remove inconsistent or irrelevant records and preserve quality variables.
- Feature extraction and selection to pinpoint salient biochemical markers or indicators, rendering predictive models more robust and efficient.
- Training and validating machine learning models, including Logistic Regression, Decision Trees, Random Forests, and Gradient Boosting, to distinguish smokers from non-smokers.
- Performance evaluation using metrics, such as accuracy, precision, recall, area under curve (AUC), and F1-score.
- Continuous model monitoring and updating to reflect recent scientific findings and shifting population behavior trends.
- Mastering smoking habits analysis arms health professionals with the right tools to foster healthier lives, combat nicotine addiction, and advocate for cleaner environments.
Dataset Details Access our curated dataset by clicking the link below: https://www.kaggle.com/competitions/playground-series-s3e24/data
This dataset contains health data describing individual patients attending a clinic. Among these variables, blood pressure readings, cholesterol levels, and glucose readings serve as potential indicators for smoking habits. Other factors, such as age, sex, race, and regional location, must be considered to rule out false positives and negatives in our analysis.