Question about multiply word embedding with target distribution

Question

Question about multiply word embedding with target distribution

deweihu96 opened this issue 3 years ago · comments

Hi Diego,

In section 3, you mentioned that word embeddings multiplied by each target distribution. It's a little ambiguous. Do you mean the matrix-matrix product?

For example, if there are L words in input and embedding size is D, the input shape is [L,D], and let's say the masking matrix's shape is [L,T]. Are you going to multiple each row in input with each column in masking matrix? Then the output shape will be [D,T]. Thanks in advance: )

-Best,
Dewei

Diego Antognini · Answer 1 · Wed Apr 14 2021 14:26:18 GMT+0800 (China Standard Time)

Hi,

It is basically a scalar time a vector. For example, if you have an input shape [L, D] and, and a mask [L, T], you should do T-1 times [L, D] * [L, 1]. Then you will have T-1 times [L, D] that you use in the separate classifiers.

Dewei Hu · Answer 2 · Wed Apr 14 2021 14:38:49 GMT+0800 (China Standard Time)

Hi,

It is basically a scalar time a vector. For example, if you have an input shape [L, D] and, and a mask [L, T], you should do T-1 times [L, D] * [L, 1]. Then you will have T-1 times [L, D] that you use in the separate classifiers.

Thanks for your reply, Diego, That makes sense!