lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About Openai DALL-E

Asthestarsfalll opened this issue · comments

Hi, what a fantastic work!
I'm wondering what network settings the original Openai DALL-E are, especially about the Transformer it used.
I can't figure out the meaning of parameter stable causal rotary_emb and shared_attn_ids.
All I know from the paper is that the image transformer use 64 attention layers, each of which uses 62 attention heads with a per-head state size of 64.
And I notice that you mention the sparse atteniton but I can't get any details about them from the blog post
If I want to train a DALL-E of original Openai version, how can I choose the parameters?
And if there exists any pretrained models or checkpoints of it?
Thank for your help in advance!