Resize embeddings so they are divisible by 64

Question

Resize embeddings so they are divisible by 64

acforvs opened this issue a year ago · comments

Hi, thanks for open sourcing the project!

Currently, the size of embeddings for StarCoder is 49152, but after one token is added it gets up to 49153 which makes it impossible to shard the model across any conventional number of GPUs (like 4 or 8).

I wonder whether it would be a correct option to add 7/15/63 random tokens like <filler_token_i> here https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/src/train_wizardcoder.py#L194 to be able to shard the model.

Do you have any suggestions about whether this seems reasonable? Thanks!

ChiYeung Law · Answer 1 · Wed Jul 12 2023 13:46:02 GMT+0800 (China Standard Time)

I think this is reasonable.