noturlee / Sales-DataAnalysis

This project aims to predict product sales based on advertising expenditures, focusing on 'TV advertising'. Machine learning techniques are employed to analyze and interpret data, enabling businesses to optimize advertising strategies and maximize sales potential.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sales Prediction Model

Table of Contents

  1. Overview
  2. Models Used
  3. Data Preprocessing
  4. Data Loading
  5. Exploratory Data Analysis (EDA)
  6. Models Training and Evaluation
  7. Interpretation of Report
  8. Data Visualization
  9. Findings
  10. Output
  11. Conclusion

1. Overview

This project aims to predict product sales based on advertising expenditures, focusing on 'TV advertising'. Machine learning techniques are employed to analyze and interpret data, enabling businesses to optimize advertising strategies and maximize sales potential.

2. Models Used

  • Random Forest Regression: Utilized for its ability to handle complex relationships and provide robust predictions.
  • Linear Regression: Employed as a baseline for comparison due to its simplicity and interpretability.

3. Data Preprocessing

Data preprocessing involves:

  • Handling missing values (if any).
  • Encoding categorical variables (if applicable).
  • Scaling numerical features.

4. Data Loading

The advertising dataset is loaded from a CSV file containing columns for 'TV', 'Radio', 'Newspaper' advertising expenditures, and 'Sales'.

5. Exploratory Data Analysis (EDA)

EDA is performed to:

  • Visualize relationships between features and target variable ('Sales').
  • Identify correlations and distributions of features.
  • Detect outliers or anomalies in the data.

6. Models Training and Evaluation

6.1. Model Training

  1. Random Forest Regression:

    • GridSearchCV used to optimize hyperparameters.
    • Best model selected based on cross-validated negative MSE score.
  2. Linear Regression:

    • Simple model trained as a baseline for comparison.

6.2. Model Evaluation

Evaluation metrics computed include:

  • Mean Squared Error (MSE): Measure of prediction accuracy.
  • Mean Absolute Error (MAE): Provides absolute measure of average error.
  • R-squared (R2): Indicates goodness of fit of the model.

7. Interpretation of Report

  • Comparison of model performance based on evaluation metrics.
  • Analysis of coefficients (for Linear Regression) and feature importances (for Random Forest) to interpret relationships between 'TV advertising' and 'Sales'.

Output Interpretation and Explanation

  • Random Forest Model:

    • Best Parameters: {'max_depth': 10, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}
    • Best Negative MSE Score: -1.6148
    • Evaluation Metrics on Test Set (Random Forest):
      • Mean Squared Error (MSE): 1.4591
      • Mean Absolute Error (MAE): 0.9170
      • R-squared (R2): 0.9528
  • Linear Regression Model:

    • Coefficients: 0.0555
    • Intercept: 7.0071
    • Evaluation Metrics on Test Set (Linear Regression):
      • Mean Squared Error (MSE): 6.1011
      • R-squared (R2): 0.8026

Explanation in Non-Technical Terms

  • Model Comparison: The Random Forest model performs better than the Linear Regression model in predicting sales based on TV advertising. It achieves this by considering complex interactions and nonlinear relationships in the data, leading to more accurate predictions.

  • Interpretation: For businesses, these models provide insights into how changes in advertising spending (specifically on TV) can impact sales. They help optimize advertising budgets by predicting potential sales outcomes with different strategies.

  • Conclusion: Based on these results, businesses can use the Random Forest model to make more reliable predictions about the effectiveness of their advertising campaigns, thereby maximizing their sales potential.

8. Data Visualization

  • Visual representations include scatter plots, pair plots, and bar plots to illustrate relationships and distributions.
  • Plots of model predictions vs. actual sales to assess performance visually.
Screenshot 2024-06-15 at 01 30 28

9. Findings

9.1. Data Exploration

  • Strong positive correlation observed between 'TV advertising' and 'Sales'.
  • 'TV' expenditure shows highest influence on 'Sales' compared to 'Radio' and 'Newspaper'.

9.2. Model Performance

  • Random Forest outperforms Linear Regression in terms of predictive accuracy.
  • Lower MSE and higher R-squared indicate Random Forest captures the relationship more effectively.

10. Output

  • Predicted sales values for new data points using both Random Forest and Linear Regression models.
Screenshot 2024-06-15 at 01 31 07

11. Conclusion

  • Random Forest is recommended for predicting sales based on 'TV advertising' due to its superior performance.
  • Insights gained can guide advertising strategies to optimize spending and maximize sales.

About

This project aims to predict product sales based on advertising expenditures, focusing on 'TV advertising'. Machine learning techniques are employed to analyze and interpret data, enabling businesses to optimize advertising strategies and maximize sales potential.

License:MIT License


Languages

Language:Python 95.1%Language:Cython 3.5%Language:C 0.9%Language:C++ 0.4%Language:JavaScript 0.0%Language:Fortran 0.0%Language:Meson 0.0%Language:CSS 0.0%Language:PowerShell 0.0%Language:Smarty 0.0%Language:Roff 0.0%Language:HTML 0.0%Language:Forth 0.0%Language:Shell 0.0%Language:Lua 0.0%