How do we get tokenizer_model

Question

How do we get tokenizer_model

yangyyt opened this issue 9 months ago · comments

Wall.E commented 9 months ago

How do we get this tokenizer_model and prepare data?

Wall.E · Answer 1 · Mon Oct 16 2023 17:43:59 GMT+0800 (China Standard Time)

I found it，https://huggingface.co/tatsu-lab/alpaca-7b-wdiff/tree/main

Srijith-rkr · Answer 2 · Mon Oct 16 2023 17:53:38 GMT+0800 (China Standard Time)

I uploaded the tokenizer_model here: https://huggingface.co/Srijith-rkr/Whispering-LLaMA/tree/main

I have also added the Alpaca model weights in the repo. Once you download it, you can merge them together for the LLM weights

Something like :
a = torch.load(alpaca_a.pth)
b = torch.load(alpaca_b.pth)
c = torch.load(alpaca_c.pth)
lit_lamma.pth = a | b | c # merging these for the final checkpoint
torch.save(lat_llama.pth, "[Mention path to Dir]")

You can also check out the notebooks at https://github.com/Srijith-rkr/Whispering-LLaMA/tree/main/data_preparation to figure out how to prepare your custom dataset.