jiweil / Sequence-Models-on-Stanford-Treebank

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BLSTM Forward/Backward

mshahriarinia opened this issue · comments

Why in both the left-to-right and right-to-left LSTM the counter goes from 1 to T? From my understanding LTR should be 1 to T and RTL should be T to 1.

Left to right:
x_t=parameter.vect(:,batch.Word(:,t));

Right to left:
x_t=parameter.vect(:,batch.Word_r(:,t));

batch.Word_r are attained by reversing batch.Word, which is identical to working from T:1 , as what you said.

There is a tricky part involved in the case where sequences in the same chunk are not of the same length. Building a separate batch.Word_r to keep track of tokens would make things easier.

I'm getting a bit confused. Imagine the two sequences in one minibatch are as this: [9 3 4 2] and [7 8 2] and padding is 0. So in a normal BLSTM with no batching, we would have:

9 3 4 2
2 4 3 9

We have one LSTM for each and go from left to right. We reverse the second sequence and concat its items one by one to the first sequence.

Now if both are in one minibatch; which one of the following would be a proper forward backward combination to concat?

9 3 4 2
7 8 2 0
---
2 4 3 9
0 2 8 7

or this one?

9 3 4 2
7 8 2 0
---
2 4 3 9
2 8 7 0

We have one LSTM going from left to right. We reverse the entire sequence and use another LSTM going from left to right (which is identical to going from right to left in the original sequence).

Suppose we have two examples in the batch
9 3 4 2
7 8 2 0
we have:
9 3 4 2

7 8 2 0

2 4 3 9
0 2 8 7

So this is where trouble comes in! in the shorter sequence 7 8 2 at first you go through paddings before reaching the first item in sequence. So the state of the cell changes due to observing paddings.

Oh. I see your point. That issue has been taken care of. You can take a close look at the code (see the function readdata)
Suppose we have to examples,

1 2 3
1 2
It would be transformed to batch.word
1 2 3
0 1 2
The position 0 would be masked and taken care during forward and backward.

When we reverse each of the examples, we have
3 2 1
2 1
and it would be transformed to batch.r_word
3 2 1
0 2 1
The position 0 would be similarly taken care of.

So you still cycle the cell's memory through paddings before reaching the very first item in sequence. Right?

You can add me at skype (bdlijiwei1) if you think a quick chat would make it easier.

Thanks for the comments. For future readers I point them to this line as you mentioned https://github.com/jiweil/Sequence-Models-on-Stanford-Treebank/blob/master/Bi_LSTM/Forward.m#L42