Smoking Habits with Health Data and Machine Learning

In this project, we utilize health data to predict whether an individual is a smoker or non-smoker using machine learning algorithms. Our motivation revolves around deriving valuable insights from quantitative measurements, enabling informed health consultations, targeted interventions, and proactive treatments.

Notebook at My Kaggle Profile

Link: https://www.kaggle.com/code/junaidullhassan/smoker-status-using-bio-signals-acc-79

Key Technologies & Libraries Used

Python
Jupyter Notebook
Scikit-learn
Seaborn
NumPy
Pandas
Matplotlib

Overview of Smoking Habits Analysis

Smoking habits analysis, also known as tobacco consumption characterization, refers to evaluating health data to infer smoking status and intensity. Organizations and healthcare providers rely on such insights to drive personalized cessation programs, monitor recovery progress, and raise public awareness on the hazards of smoking.

Core components of smoking habits analysis consist of:

Data preparation and cleansing to remove inconsistent or irrelevant records and preserve quality variables.
Feature extraction and selection to pinpoint salient biochemical markers or indicators, rendering predictive models more robust and efficient.
Training and validating machine learning models, including Logistic Regression, Decision Trees, Random Forests, and Gradient Boosting, to distinguish smokers from non-smokers.
Performance evaluation using metrics, such as accuracy, precision, recall, area under curve (AUC), and F1-score.
Continuous model monitoring and updating to reflect recent scientific findings and shifting population behavior trends.
Mastering smoking habits analysis arms health professionals with the right tools to foster healthier lives, combat nicotine addiction, and advocate for cleaner environments.

Dataset Details Access our curated dataset by clicking the link below: https://www.kaggle.com/competitions/playground-series-s3e24/data

This dataset contains health data describing individual patients attending a clinic. Among these variables, blood pressure readings, cholesterol levels, and glucose readings serve as potential indicators for smoking habits. Other factors, such as age, sex, race, and regional location, must be considered to rule out false positives and negatives in our analysis.

About

Employ a machine learning (ML) model to predict whether patients smoke or not based on relevant health data, such as blood pressure, cholesterol levels, and other pertinent physiological metrics.

Apache License 2.0

Languages

Language:Jupyter Notebook 100.0%