technovangelist / obm

A tool to learn how your gpu compares to others when using ollama

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

win-obm runs only llama2:7b and does not detect RAM and VRAM

mann1x opened this issue · comments

Is this a known issue?

This is the output:

Microsoft Windows 10 Pro 10.0.19045 with NaNGB and AMD Ryzen 9 5950X 16-Core Processor with 32 cores
GPU Info:
NVIDIA NVIDIA GeForce RTX 3090 with NaNGB vram

Using Ollama version: 0.1.31
Ensuring models are downloaded.
Loading orca-mini to reset
Loading llama2:7b
First run of llama2:7b took 1.83 seconds to load then 1.85 seconds to evaluate with 112.65 tokens per second
Second run of llama2:7b took NaN seconds to load then 2.30 seconds to evaluate with 112.63 tokens per second
Third run of llama2:7b took 0.00 seconds to load then 2.87 seconds to evaluate with 112.20 tokens per second
Fourth run of llama2:7b took 0.00 seconds to load then 2.90 seconds to evaluate with 112.70 tokens per second
Average Tokens per Second for llama2:7b is 112.55

Do you approve to send the output from this command to obm.tvl.st to share with everyone? No personal info is included [y/N] y
Your OBMScore is 844 and is made of 3 components:
llama2:7b OBMScore: 844
llama2:13b OBMScore: 0
llama2:70b OBMScore: 0

doh, hadnt tried since windows... in fact I was just thinking about this last night wondering if it did....

sadly it doesn't :)
wonder if I can do some tests but I don't know typescript at all...

from what I see you don't set the temperature at 0, is that right?
would be better for benchmarking otherwise the scores can have big fluctuations

I'm also curious about the load_duration, I see you use it.
Is it accessible in some way also via the http api? from the metrics seems missing