We need JSON Configurations files to properly configure the model when downloading them.
ArEnSc opened this issue · comments
If we take a look at this hugging face repo, there are a bunch of parameters that need to be copied from the description into the JSON.
https://huggingface.co/BlinkDL/rwkv-4-pile-1b5
This one contains a context length of 4096
https://huggingface.co/BlinkDL/rwkv-4-pile-1b5/resolve/main/RWKV-4-Pile-1B5-20220929-ctx4096.pth
While this one contains a context length of 1024
https://huggingface.co/BlinkDL/rwkv-4-pile-1b5/blob/main/RWKV-4-Pile-1B5-20220903-8040.pth
This allow us to configure the model without any programming given a HuggingFace url.
Heres the sample using the link above.
{
"_name_or_path": "RWKV-4-Pile-1B5",
"url":"https://huggingface.co/BlinkDL/rwkv-4-pile-1b5/resolve/main/RWKV-4-Pile-1B5-20220929-ctx4096.pth"
"eos_token_id":0,
"pad_token_id":1,
"d_model":2048, # n_embd
"is_encoder_decoder": False,
"num_decoder_layers":24, # Number of layers !
"vocab_size": 50276, #TODO Verify 50253? or 50276? 50277?
"n_positions":4096, # Context Length !
}
We need to verify the vocab size of the model? I think it is 50277
We need this JSON configuration for all models for each *.pth for every parameter size available on hugging face
Ignoring RWKV3 model