kuprel / min-dalle

min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

YouTube video walk-through of this codebase

gordicaleksa opened this issue · comments

Hi @kuprel!

First of all awesome work, you made my job that much easier. :)

I created a YouTube video where I do a deep dive/walk-through of this repo.

Maybe someone finds it useful:
https://youtu.be/x_8uHX5KngE

Hopefully it's ok to share it here in the form of an issue, do let me know!

Wow this is great! I just added your video to the readme. You're right the clamping is unnecessary. It originally served to avoid a cryptic cuda runtime error. Later I implemented a more precise solution to limit the BART decoder to 2**14 tokens to match the VQGAN. I'm not sure why there's a mismatch in vocabulary counts. Also I didn't realize those are shared weights. There's probably a simpler solution here. Great video!

I checked to see if the embedding weights in the BART decoder were the same weights as the embedding weights in the VQGAN detokenizer. It seems they are actually different. The BART decoder in Dalle Mega is embedding to 2048 dimensions and the VQGAN is embedding to 256 dimensions.

Screenshot 2022-07-31 at 12 25 13 PM