Forecasting-intermittent-demand-a-comparative-approach

Problem description

Sustainable use of drinking water is essential to society. Water resources are under increasing pressure from growing demand and climate change. Dutch water utilities are looking to raise awareness and find potential ways of saving drinking water. In doing so, Seita is working on a pattern recognition and classification toolkit for monitoring systems that will allow real-time automated analysis. Distinctive in this work is the use of Intermittent Demand data. ”When a product has many periods of zero demand (also known as irregular demand), it is said to have intermittent demand (ID). In these cases, the demand is often small and highly variable in scale and the many zero values in ID time-series usually render usual forecasting methods difficult to apply.

Data

The original dataset used in this analysis describes the water consumption from a customer building. Data was collected from 4 July 2019 to 31 December 2020. Value is expressed in litres and it refers to the quantity of water measured up to that date.

Experimental setup

In order to come up with to the final results, the following approach was adopted. A series of 8 forecasters were implemented, where both Machine Learning and Statistical models were used. Then models’ predictions were compared among themselves to evaluate their outcomes. Eventually models performances were investigated against different forecast horizons and different times of the day (including anomalies), by selecting some specific scenarios. These particular scenarios were selected to have the following characteristics:

a Forecast Horizon equal to 48 hours, which means we want to predict each time two days in the future;
2 time moments of the day, which are noon and midnight, as we want to see how it differs if we start predicting values during the day or during the night;
3 different moments of the demand, which are a moment of low demand of a normal week, a moment of demand just before the beginning of the anomaly and a moment of demand in a week into the anomaly.

| | | | |

Results

The following table shows the final results. The accuracy of the models was evaluated by taking into account the Weighted Absolute Percentage Error (WAPE), and in particular, it was defined as 1-WAPE. Moreover, all cells were filled by the green or the red colour according to a threshold value previously defined, to give a visual idea of which models perform better concerning each scenario. In particular, if the model accuracy was greater or equal than 70%, then that model was considered as a good model and the cell was filled by green, otherwise by red. Eventually, concerning the scenarios, the bold font was used to highlight those values representing the best model accuracy for that specific scenario.

Some findings:

Best model: KNeighbors Regressor
Worst model: Naive Forecaster
Best scenario: “Low demand in a normal week, at noon!”
Worst scenario: both right before and right after the beginning of the anomaly
Scenario 3: performances drop tremendously
Scenario 4: very poor KNeighbors Regressor performance while surprisingly good AutoETS and Exponential Smoothing performances
Scenario 5: unexpected results as 3 out of 4 models reaches the threshold value

Pietroobbiso / Forecasting-intermittent-demand-a-comparative-approach

Forecasting-intermittent-demand-a-comparative-approach

Problem description

Data

Experimental setup

Results

About

Languages