Predicting Daily Power Consumption: A Supervised Regression Approach

1. Introduction

This project explores daily power consumption patterns over several years to predict future usage accurately. Using machine learning models, we aim to assist energy providers in balancing energy distribution, optimizing usage, and forecasting demand. Our supervised regression approach evaluates various models, focusing on the prediction accuracy of future daily power consumption.

2. Project Overview

Project Type: Supervised Regression (Time Series Prediction)
Dataset: Power Consumption Dataset from UCI Machine Learning Repository
Goal: Predict daily power consumption and evaluate model performance
Techniques: Random Forest, XGBoost, and other regression algorithms
Evaluation Metric: Root Mean Squared Error (RMSE)

3. Problem Statement

Accurate power consumption prediction helps energy providers optimize energy generation and distribution, reducing waste. Due to the variability in daily consumption influenced by multiple factors, building robust predictive models is essential for reliable energy forecasting.

4. Project Objectives

Build a Predictive Model: Use historical data to forecast daily power consumption.
Compare Models: Evaluate the performance of different supervised regression models.
Optimize RMSE: Ensure the best model achieves an RMSE below 450 kW.
Analyze Trends: Compare predicted consumption trends with actual data.

5. Skills Demonstrated / Technologies Used

Data Analysis & Visualization: Using Python libraries such as matplotlib for visual insights
Machine Learning: Random Forest, XGBoost, and model optimization
Model Evaluation: RMSE as the primary metric
Documentation: Clear documentation using Jupyter Notebooks and GitHub

6. Dataset Overview

The dataset includes daily power consumption over several years. Each entry records:

Date: Date of measurement
Power Consumption: Daily usage in kW
Other Features: Year, semester, quarter, day of the week, week of the year, and month

The dataset is provided in two files:

df_train.csv (Training Data)
df_test.csv (Testing Data)

Dataset Source Link

7. Project Methodology

Data Preprocessing

Data Inspection: Load and review the dataset.
Missing Values: Check and handle missing values.
Feature Engineering: Convert categorical variables to numerical formats.
Normalization: Standardize data where needed.

Model Selection

Random Forest Regression: To leverage feature importance in predictions.
XGBoost: Known for effective boosting and generalization.

Model Training & Evaluation

Training: Models were trained on the training set.
Evaluation Metric: RMSE used to assess performance.
Optimization: Hyperparameters were tuned to reduce RMSE below 450 kW.

Trend Analysis

Visualization: Plots of actual vs. predicted values were generated.
Trend Similarity: Visual checks confirmed poor prediction accuracy in following consumption trends.

8. Results

Best Model: Random Forest with an RMSE of 435.02 kW.
Trend Analysis: Predictions didn't follow similar patterns to actual data, demonstrating poor reliable model performance.

9. Conclusion

This project explored predicting daily power consumption using supervised regression models. By leveraging historical consumption data, we aimed to build an accurate model for energy forecasting. The Random Forest model achieved an RMSE of 435.02 kW, indicating a relatively low prediction error.

However, upon analyzing the trend similarity between predicted and actual values, the correlation coefficient was below 0.9, leading to a "No" result for trend similarity. This outcome suggests that while the model's predictions are reasonably close in terms of error, they do not follow the exact trend of actual consumption. This discrepancy may be due to the lack of external factors such as weather or seasonal events in the dataset, which could improve the model's ability to capture fluctuations in consumption patterns.

Overall, the project demonstrates the potential of machine learning in energy forecasting but also highlights areas for further improvement, particularly in incorporating additional external variables to enhance trend alignment and predictive accuracy.

10. Future Recommendations

Include External Variables: Integrate weather, holiday, and household data for more accurate predictions.
Refine Models: Test neural networks or advanced time series models for further improvement.
Advanced Time Series Analysis: Implement models like ARIMA or LSTM to capture complex trends.

yanny-alt / Energy-Consumption-Prediction-Using-Machine-Learning