nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Resize embeddings so they are divisible by 64

acforvs opened this issue · comments

commented

Hi, thanks for open sourcing the project!

Currently, the size of embeddings for StarCoder is 49152, but after one token is added it gets up to 49153 which makes it impossible to shard the model across any conventional number of GPUs (like 4 or 8).

I wonder whether it would be a correct option to add 7/15/63 random tokens like <filler_token_i> here https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/src/train_wizardcoder.py#L194 to be able to shard the model.

Do you have any suggestions about whether this seems reasonable? Thanks!

I think this is reasonable.