philipperemy / n-beats

Keras/Pytorch implementation of N-BEATS: Neural basis expansion analysis for interpretable time series forecasting.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GenericBlock Module Issues

jagraves21 opened this issue · comments

I believe I have found a couple of issues with the PyTorch implementation of the GenericBlock module.

1. From reading the original paper, I do not think that the $\theta^b$ and $\theta^f$ outputs should have ReLu activations (see here). I know the paper says the FC layers have ReLu non-linearity, but equation (1) in the paper says $\theta^b$ and $\theta^f$ are produced with a Linear function (maybe the FC in Figure 1 is a typo).

tmp
Screen Shot 2021-05-20 at 10 08 41 PM

2. I am almost certain that if the $g^b$ and $g^f$ functions (which produce the backcast and forecast outputs) are implemented using a torch.nn.Linear layer, there should NOT be a bias (see here). The $\theta^b$ and $\theta^f$ vectors are expansion coefficients that are used to scale the learnable $v^b$ and $v^f$ basis vectors.

Screen Shot 2021-05-20 at 10 24 54 PM

If I misunderstood anything, please let me know.

Ad 1. I came here to write the same thing, I also looked at the implementation by the original authors and they do not have ReLu there. Ad 2, I believe there should be bias. The paper is self-contradictory: In 3.3, the authors describe using bias:
Screenshot from 2021-05-27 23-24-34
This is also reflected in the original authors' code (see: repo by ElementAI). Caution: that repo has been created by the original authors, but is probably not the code they initially wrote, as it's PyTorch and in the paper they talk about using TensorFlow.
In case of the GenericBlock, theta=Linear(hidden_4) and output = Linear_with_bias(theta), as described in the paper if you consided 3.3 is equivalent to output=Linear_with_bias(hidden_4), as in the ElementAI implementation.
Additionally, the ElementAI implementation uses linear with bias to obtain theta for all block types - this is not explained in the paper at all, so I have no idea which way that one is supposed to be.

@chedatomasz I'm pretty sure the $g^{b}$ and $g^{f}$ blocks should not have a bias. In order for $g^{b}$ and $g^{f}$ to be linear projections, as stated by the authors, the bias needs to be 0: the operation $V^{f}\theta^{f}$ can be viewed as a linear combination where the columns of $V^{f}$ serve as basis vectors and the columns of $\theta^{f}$ are the coefficients.

I'm pretty convinced that there are bugs in both this implementation and the ElementAI implementation of the trend and seasonality blocks. I don't think they construct the weight matrices as described in the paper.

I agree that if $g^{b}$ and $g^{f}$ are to truly be linear projections, the bias would need to be zero. And I agree that on page three they call it a linear projection, and provide equations for a linear projection. But on page four, in the fragment I quoted, they define it again, this time as fully connected with bias.

Regarding forming the matrices: For generic basis, the matrices are present as parameters of the Linear layers, hidden by the framework. For seasonality and trend, the matrices are formed explicitly in seasonality_model and trend_model

Based on the implementation in this repo, the following are the backcast and forecast basis vectors given a backcast length of 30, a forecast length of 5, and a theta dimension of 4:

incorrect_trend

The original paper does not say how $g^{b}$ is constructed for the trend or seasonality blocks. The description is only for the $g^{f}$ block. It turns out that the forecast vectors in the $g_{f}$ trend block form a basis for all $n=\theta-1$ degree polynomials. This is why I believe the backcast vectors should be constructed in the same way, and they should be as follows:

correct_trend

Regarding this one, I just added another issue. I believe this reflects the code in ElementAI's implementation, where they have flipped time axis for backcast. I will email the original authors to confirm what their intentions were.

Point 1. fixed: 0e44180. The keras impl did not seem to have this issue.
Point 2. so we should leave the linear with bias=True?
Point 3. cf. the other issue raised.

I'll close this. Let me know if it LGTY!