YouTube video walk-through of this codebase

Question

YouTube video walk-through of this codebase

gordicaleksa opened this issue 2 years ago · comments

Aleksa Gordić commented 2 years ago

Hi @kuprel!

First of all awesome work, you made my job that much easier. :)

I created a YouTube video where I do a deep dive/walk-through of this repo.

Maybe someone finds it useful:
https://youtu.be/x_8uHX5KngE

Hopefully it's ok to share it here in the form of an issue, do let me know!

Brett Kuprel · Answer 1 · Sun Jul 31 2022 23:04:23 GMT+0800 (China Standard Time)

Wow this is great! I just added your video to the readme. You're right the clamping is unnecessary. It originally served to avoid a cryptic cuda runtime error. Later I implemented a more precise solution to limit the BART decoder to 2**14 tokens to match the VQGAN. I'm not sure why there's a mismatch in vocabulary counts. Also I didn't realize those are shared weights. There's probably a simpler solution here. Great video!

Brett Kuprel · Answer 2 · Mon Aug 01 2022 00:36:14 GMT+0800 (China Standard Time)

I checked to see if the embedding weights in the BART decoder were the same weights as the embedding weights in the VQGAN detokenizer. It seems they are actually different. The BART decoder in Dalle Mega is embedding to 2048 dimensions and the VQGAN is embedding to 256 dimensions.