Painfully slow

Question

Painfully slow

jacobhweeks opened this issue a year ago · comments

I managed to get alpaca running on a Hyper-V VM on my PowerEdge R710. The VM has 8 cores and 16GB of RAM. Running Ubuntu 22.04. I had to Make chat from source, otherwise I got an Illegal Instruction error. The problem is that it takes like 1 minute per token. How can I improve this?

main: seed = 1680482599
llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.34 MB
llama_model_load: memory_size = 2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

TheCodeGeek · Answer 1 · Mon Apr 03 2023 12:49:49 GMT+0800 (China Standard Time)

I would like to rebuild from source while enabling each of the options such as AVX, AVX2 one at a time so that I can fine tune it for my machine, but I don't know where to do this in chat.cpp.
I also don't know that this is a good solution, it's just an idea. Were these features disabled when I compiled it from source? Possibly because my machine doesn't support those features?
Can anyone please help?

Tempaccnt · Answer 2 · Tue Apr 04 2023 07:22:48 GMT+0800 (China Standard Time)

from what I could see, the settings are automatically calculated. I believe it's set through a c script by the name of ggml.c using mathematical equations. you could make a backup of that file then edit it manually and see what suit your machine. just don't go overboard or your machine might freeze and will have to reboot it.

TheCodeGeek · Answer 3 · Tue Apr 04 2023 11:36:50 GMT+0800 (China Standard Time)

@Tempaccnt thank you, I realized that they are automatically set based on the CPU when I realized that those were CPU flags last night. That's a bummer. Will I gain any benefit from installing a GPU if I can tie the VM to it, or does the software only utilize CPU performance?

Greg Martin · Answer 4 · Tue Apr 04 2023 12:10:18 GMT+0800 (China Standard Time)

As far as I have read, this approach using alpaca is CPU only. I am looking to figure things out myself since I am running an old i7-6700k and tokens are crawling and the CPU is pegged right now.

TheCodeGeek · Answer 5 · Tue Apr 04 2023 12:11:22 GMT+0800 (China Standard Time)

Okay, thanks!

…

On Mon, Apr 3, 2023 at 9:10 PM Greg Martin ***@***.***> wrote: As far as I have read, this approach using alpaca is CPU only. I am looking to figure things out myself since I am running an old i7-6700k and tokens are crawling and the CPU is pegged right now. — Reply to this email directly, view it on GitHub <#191 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABP7HRHFTFPDFCKXC4RLCULW7ONLJANCNFSM6AAAAAAWQU4IQA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Respectfully, Jacob H. Weeks

FrankDMartinez · Answer 6 · Mon Apr 10 2023 10:26:36 GMT+0800 (China Standard Time)

I am seeing similar slow speeds on an Intel Mac. If anyone has any suggestions, I would be most appreciative. Thanks.