lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why No Softmax?

kingnobro opened this issue · comments

In class DALLE(nn.Module), there is a member called to_logits,

self.to_logits = nn.Sequential(
    nn.LayerNorm(dim),
    nn.Linear(dim, self.total_tokens),
)

Why there is no Softmax after nn.Linear? I read the paper Attention Is All You Need, and there is a softmax function after linear layer.
If there is no softmax, the value in logits might be very big. So in function generate_images, when passing the logits containing a very big number to function gumbel_sample, the uniform noise cannot influence the sample result.