Phi-3 support

Question

Phi-3 support

Theodotus1243 opened this issue 2 months ago · comments

Bohdan Mykhailenko commented 2 months ago

Powerful model trained on syntetic data, has high MMLU

4K context window one should be easier, as has no LongRope

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
https://arxiv.org/pdf/2404.14219.pdf

BBC-Esq · Answer 1 · Wed Apr 24 2024 18:23:03 GMT+0800 (China Standard Time)

I second this. The current phi loader is broken, apparently because of some changes that Microsoft did to the model after it was initially released. At any rate, adapting the phi loader to the new phi3 should be easier than starting from scratch.

Jon Craton · Answer 2 · Thu Apr 25 2024 01:31:26 GMT+0800 (China Standard Time)

For anyone else researching this, phi3 support has been added to the convert_hf_to_gguf.py script in llama.cpp. Perhaps something can be gleaned from there to simplify the implementation of the ct2 converter.

Vincent Nguyen · Answer 3 · Thu Apr 25 2024 01:39:16 GMT+0800 (China Standard Time)

no worry it will be done, it's quite easy for the mini-4k since it takes all llama2 arch.
fyi: https://forum.opennmt.net/t/phi-3-3-8b-llama2-7b-ensemble-just-for-fun/5729

BBC-Esq · Answer 4 · Thu Apr 25 2024 04:31:10 GMT+0800 (China Standard Time)

Is it done yet? I've been waiting patiently for approximately two hours now? ;-)

Minh-Thuc · Answer 5 · Thu Apr 25 2024 15:49:15 GMT+0800 (China Standard Time)

Hello, I am working on it. Some unexpected problems appears.

BBC-Esq · Answer 6 · Thu Apr 25 2024 15:56:21 GMT+0800 (China Standard Time)

I'm not skilled enough to help directly by implementing the code...but if you want me to do any grunt work or research let me know dude...anything to assist speed up the process. Thanks!

BBC-Esq · Answer 7 · Thu Apr 25 2024 16:22:25 GMT+0800 (China Standard Time)

I'd like to start learning to eventually possibly help...Question...how do I get the actual model architecture to start with...It's my understanding that getting the model's structure, what activation functions are used, etc. and basically starting to understanding the structure is key in making additional converters down the road. For example, here's a link:

https://bbycroft.net/llm

Here are some other links that I've been gathering with the goal of eventually contributing a converter...based on first trying to understand the structure of LLMs...

https://github.com/mert-kurttutan/torchview

https://github.com/lutzroeder/netron

Huggingface sometimes (but not always) has information like this...

Basically, any good starting point for me that you'd recommend dude? Thanks!

BBC-Esq · Answer 8 · Thu Apr 25 2024 16:23:19 GMT+0800 (China Standard Time)

Remember, you're dealing with an idiot who doesn't do this for a profession and has never taken the LLM 101 class in college let alone have a doctoral degree. ;-) I don't even know what "mlp.down" or "layernorm.weight" means, for example, but am willing to learn.

Minh-Thuc · Answer 9 · Thu Apr 25 2024 20:32:21 GMT+0800 (China Standard Time)

PR #1680 to add the converter for Phi3