Question about multiply word embedding with target distribution
deweihu96 opened this issue · comments
Hi Diego,
In section 3, you mentioned that word embeddings multiplied by each target distribution. It's a little ambiguous. Do you mean the matrix-matrix product?
For example, if there are L words in input and embedding size is D, the input shape is [L,D], and let's say the masking matrix's shape is [L,T]. Are you going to multiple each row in input with each column in masking matrix? Then the output shape will be [D,T]. Thanks in advance: )
-Best,
Dewei
Hi,
It is basically a scalar time a vector. For example, if you have an input shape [L, D] and, and a mask [L, T], you should do T-1 times [L, D] * [L, 1]. Then you will have T-1 times [L, D] that you use in the separate classifiers.
Hi,
It is basically a scalar time a vector. For example, if you have an input shape [L, D] and, and a mask [L, T], you should do T-1 times [L, D] * [L, 1]. Then you will have T-1 times [L, D] that you use in the separate classifiers.
Thanks for your reply, Diego, That makes sense!