GenericBlock Module Issues
jagraves21 opened this issue · comments
I believe I have found a couple of issues with the PyTorch implementation of the GenericBlock
module.
1. From reading the original paper, I do not think that the FC
layers have ReLu non-linearity, but equation (1) in the paper says Linear
function (maybe the FC
in Figure 1 is a typo).
2. I am almost certain that if the torch.nn.Linear
layer, there should NOT be a bias (see here). The
If I misunderstood anything, please let me know.
Ad 1. I came here to write the same thing, I also looked at the implementation by the original authors and they do not have ReLu there. Ad 2, I believe there should be bias. The paper is self-contradictory: In 3.3, the authors describe using bias:
This is also reflected in the original authors' code (see: repo by ElementAI). Caution: that repo has been created by the original authors, but is probably not the code they initially wrote, as it's PyTorch and in the paper they talk about using TensorFlow.
In case of the GenericBlock, theta=Linear(hidden_4) and output = Linear_with_bias(theta), as described in the paper if you consided 3.3 is equivalent to output=Linear_with_bias(hidden_4), as in the ElementAI implementation.
Additionally, the ElementAI implementation uses linear with bias to obtain theta for all block types - this is not explained in the paper at all, so I have no idea which way that one is supposed to be.
@chedatomasz I'm pretty sure the
I'm pretty convinced that there are bugs in both this implementation and the ElementAI implementation of the trend and seasonality blocks. I don't think they construct the weight matrices as described in the paper.
I agree that if
Regarding forming the matrices: For generic basis, the matrices are present as parameters of the Linear layers, hidden by the framework. For seasonality and trend, the matrices are formed explicitly in seasonality_model and trend_model
Based on the implementation in this repo, the following are the backcast and forecast basis vectors given a backcast length of 30, a forecast length of 5, and a theta dimension of 4:
The original paper does not say how
Regarding this one, I just added another issue. I believe this reflects the code in ElementAI's implementation, where they have flipped time axis for backcast. I will email the original authors to confirm what their intentions were.
Point 1. fixed: 0e44180. The keras impl did not seem to have this issue.
Point 2. so we should leave the linear with bias=True?
Point 3. cf. the other issue raised.
Point3: be40395
I'll close this. Let me know if it LGTY!