Obesity-Risk-Analysis-and-Prediction

In the field of Obesity Risk Analysis and Detection, historical data plays a crucial role in developing predictive models aimed at estimating an individual's likelihood of experiencing weight issues based on several key lifestyle factors. These factors span dietary habits, physical activities, substance usage, hydration levels, transportation modes, screen time exposure, and calorie consumption tracking.

Applying robust machine learning techniques enables analysts to identify critical correlations amongst these determinants, subsequently crafting intelligent systems equipped to predict obesity risks effectively. With preventive healthcare initiatives prioritized, such analyses support tailored guidance and informed decisions toward mitigating adverse health impacts. Addressing these concerns translates not only to improved well-being but reduced strain on medical resources, thus benefitting society at large.

Notebook at My Kaggle Profile

If you want to see my notebook at Kaggle you can use the link https://www.kaggle.com/code/junaidullhassan/obesity-risk-prediction-gradientboosting-xgboost

Key Technologies & Libraries Used

Python
Jupyter Notebook
sci-kit-learn
seaborn
Numpy
Pandas
matplotlib

Choose Machine Learning Estimators

Benefits of XGBoost:

Fast calculations thanks to its ability to divide tasks among many computer processors simultaneously.
Automatic control of overfitting with built-in settings that make the model simpler and less sensitive to small changes.
Easily handles cases where there are missing entries in the dataset, saving the time and effort needed for fixing them manually.
Allows choosing any mathematical formula for measuring errors, giving more flexibility to fit complex situations.
More effective use of second-order derivatives to optimize results, usually leading to better performance.
Performs cross-validation internally during training, so there's less manual tweaking required.

Benefits of Gradient Boosting Machine (GBM):

Strong theoretical background ensures consistent performance and wide applicability.
Step-by-step addition of trees to form the complete model, helping understand individual component performances.
Option to integrate third-party libraries for distributing tasks across multiple computer processors or even servers.
Supports various kinds of loss functions suitable for regression, binary, and multi-class classification tasks.
Offers transparency with respect to feature importances, revealing which ones impact the output the most.

About Project Goals

For this project, our focus lies in understanding essential factors contributing to increased obesity risk using collected data. Furthermore, we aspire to build a reliable machine learning tool capable of assessing personalized obesity risk predictions based on historical profiles.By delving deeper into the influential elements driving obesity odds, we hope to contribute meaningful insights facilitating awareness and prevention. Simultaneously, leveraging comprehensive data sets to train accurate ML models offers valuable perspectives regarding personalized obesity risk estimations, empowering users to adopt informed choices concerning healthy living improvements.

About Dataset

To get started, grab your own copy of our curated dataset by visiting the link below: https://www.kaggle.com/competitions/playground-series-s4e1/data

FAVC: Frequent intake of high-caloric food items
FCVC: Regularity of vegetable consumption
NCP: Quantity of main meals per day
CAEC: Snacking frequency
SMOKE: Tobacco smoking habit
CH2O: Average daily water ingestion volume
SCC: Accuracy of calories consumption monitoring
FAF: Level of regular exercise participation
TUE: Hours spent utilizing electronic devices
CALC: Precision of calories expenditure measurement
MTRANS: Selection of transport methods
NObeyesdad: Assessment of obesity risk classification or weight category

junaidulhassan / Obesity-Risk-Analysis-and-Prediction