GenericBlock Module Issues

Question

GenericBlock Module Issues

jagraves21 opened this issue 3 years ago · comments

I believe I have found a couple of issues with the PyTorch implementation of the GenericBlock module.

1. From reading the original paper, I do not think that the $\theta^b$ and $\theta^f$ outputs should have ReLu activations (see here). I know the paper says the FC layers have ReLu non-linearity, but equation (1) in the paper says $\theta^b$ and $\theta^f$ are produced with a Linear function (maybe the FC in Figure 1 is a typo).

2. I am almost certain that if the $g^b$ and $g^f$ functions (which produce the backcast and forecast outputs) are implemented using a torch.nn.Linear layer, there should NOT be a bias (see here). The $\theta^b$ and $\theta^f$ vectors are expansion coefficients that are used to scale the learnable $v^b$ and $v^f$ basis vectors.

If I misunderstood anything, please let me know.

Tomasz Cheda · Answer 1 · Fri May 28 2021 05:39:15 GMT+0800 (China Standard Time)

Ad 1. I came here to write the same thing, I also looked at the implementation by the original authors and they do not have ReLu there. Ad 2, I believe there should be bias. The paper is self-contradictory: In 3.3, the authors describe using bias:

This is also reflected in the original authors' code (see: repo by ElementAI). Caution: that repo has been created by the original authors, but is probably not the code they initially wrote, as it's PyTorch and in the paper they talk about using TensorFlow.
In case of the GenericBlock, theta=Linear(hidden_4) and output = Linear_with_bias(theta), as described in the paper if you consided 3.3 is equivalent to output=Linear_with_bias(hidden_4), as in the ElementAI implementation.
Additionally, the ElementAI implementation uses linear with bias to obtain theta for all block types - this is not explained in the paper at all, so I have no idea which way that one is supposed to be.

Jeffrey Graves · Answer 2 · Fri May 28 2021 08:22:11 GMT+0800 (China Standard Time)

@chedatomasz I'm pretty sure the $g^{b}$ and $g^{f}$ blocks should not have a bias. In order for $g^{b}$ and $g^{f}$ to be linear projections, as stated by the authors, the bias needs to be 0: the operation $V^{f}\theta^{f}$ can be viewed as a linear combination where the columns of $V^{f}$ serve as basis vectors and the columns of $\theta^{f}$ are the coefficients.

I'm pretty convinced that there are bugs in both this implementation and the ElementAI implementation of the trend and seasonality blocks. I don't think they construct the weight matrices as described in the paper.

Tomasz Cheda · Answer 3 · Fri May 28 2021 08:58:57 GMT+0800 (China Standard Time)

I agree that if $g^{b}$ and $g^{f}$ are to truly be linear projections, the bias would need to be zero. And I agree that on page three they call it a linear projection, and provide equations for a linear projection. But on page four, in the fragment I quoted, they define it again, this time as fully connected with bias.

Regarding forming the matrices: For generic basis, the matrices are present as parameters of the Linear layers, hidden by the framework. For seasonality and trend, the matrices are formed explicitly in seasonality_model and trend_model

Jeffrey Graves · Answer 4 · Fri May 28 2021 10:23:13 GMT+0800 (China Standard Time)

Based on the implementation in this repo, the following are the backcast and forecast basis vectors given a backcast length of 30, a forecast length of 5, and a theta dimension of 4:

The original paper does not say how $g^{b}$ is constructed for the trend or seasonality blocks. The description is only for the $g^{f}$ block. It turns out that the forecast vectors in the $g_{f}$ trend block form a basis for all $n=\theta-1$ degree polynomials. This is why I believe the backcast vectors should be constructed in the same way, and they should be as follows:

Tomasz Cheda · Answer 5 · Fri May 28 2021 10:35:09 GMT+0800 (China Standard Time)

Regarding this one, I just added another issue. I believe this reflects the code in ElementAI's implementation, where they have flipped time axis for backcast. I will email the original authors to confirm what their intentions were.

Philippe Rémy · Answer 6 · Wed Aug 04 2021 16:33:20 GMT+0800 (China Standard Time)

Point 1. fixed: 0e44180. The keras impl did not seem to have this issue.
Point 2. so we should leave the linear with bias=True?
Point 3. cf. the other issue raised.

Philippe Rémy · Answer 7 · Wed Aug 04 2021 21:10:10 GMT+0800 (China Standard Time)

Point3: be40395

Philippe Rémy · Answer 8 · Fri Aug 06 2021 16:11:59 GMT+0800 (China Standard Time)

I'll close this. Let me know if it LGTY!