ArEnSc / Production-RWKV

This project aims to make RWKV Accessible to everyone using a Hugging Face like interface, while keeping it close to the R and D RWKV branch of code.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

We need JSON Configurations files to properly configure the model when downloading them.

ArEnSc opened this issue · comments

If we take a look at this hugging face repo, there are a bunch of parameters that need to be copied from the description into the JSON.

https://huggingface.co/BlinkDL/rwkv-4-pile-1b5

This one contains a context length of 4096
https://huggingface.co/BlinkDL/rwkv-4-pile-1b5/resolve/main/RWKV-4-Pile-1B5-20220929-ctx4096.pth

While this one contains a context length of 1024
https://huggingface.co/BlinkDL/rwkv-4-pile-1b5/blob/main/RWKV-4-Pile-1B5-20220903-8040.pth

This allow us to configure the model without any programming given a HuggingFace url.

Heres the sample using the link above.

{
  "_name_or_path": "RWKV-4-Pile-1B5", 
  "url":"https://huggingface.co/BlinkDL/rwkv-4-pile-1b5/resolve/main/RWKV-4-Pile-1B5-20220929-ctx4096.pth"
  "eos_token_id":0,
  "pad_token_id":1, 
   "d_model":2048, # n_embd
  "is_encoder_decoder": False, 
  "num_decoder_layers":24, # Number of layers !
 "vocab_size": 50276, #TODO Verify 50253? or 50276? 50277?
  "n_positions":4096, # Context Length !
}

We need to verify the vocab size of the model? I think it is 50277

We need this JSON configuration for all models for each *.pth for every parameter size available on hugging face
Ignoring RWKV3 model