ggerganov / ggml

Tensor library for machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Garbage output on Metal on x86-64 mac

kanav99 opened this issue · comments

Hi, I get garbage when I run gpt2 with metal.

Here are the steps I took:

cmake -DGGML_METAL=ON -DBUILD_SHARED_LIBS=Off ..
make -j gpt-2-batched
./bin/gpt-2-batched -m models/gpt-2-117M/ggml-model.bin -p "This is an example" -ngl 1 -s 1703042754

Output:

main: seed = 1703042754
gpt2_model_load: loading model from 'models/gpt-2-117M/ggml-model.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx   = 1024
gpt2_model_load: n_embd  = 768
gpt2_model_load: n_head  = 12
gpt2_model_load: n_layer = 12
gpt2_model_load: ftype   = 1
gpt2_model_load: qntvr   = 0
gpt2_model_load: ggml tensor size    = 384 bytes
gpt2_model_load: backend buffer size = 312.72 MB
gpt2_model_load: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: found device: Intel(R) Iris(TM) Plus Graphics 655
ggml_metal_init: picking default device: Intel(R) Iris(TM) Plus Graphics 655
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/Users/<redacted>/ggml/build/bin/ggml-metal.metal'
ggml_metal_init: GPU name:   Intel(R) Iris(TM) Plus Graphics 655
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  =  1610.61 MB
ggml_metal_init: maxTransferRate               = built-in GPU
gpt2_model_load: memory size =   144.00 MB, n_mem = 24576
gpt2_model_load: model size  =   239.08 MB
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.
main: compute buffer size: 6.46 MB
main: prompt: 'This is an example'
main: number of tokens in prompt = 4, first 8 tokens: 1212 318 281 1672 

 and related)],, assignment 2013][ ] 2011 assignment]. ] ][nyder ][]] ]RANTterRANTDCter]:RANTode Postedode ].hell"]hell ]hellSBwoodeodeaskingmarthell ]]:batwowoodehell"] ][odeaskaskingCmdhellode].],],ode],woodewoTF ],woaskingRMwowo ‎][]TFodeode ‎Terwo ]ray ];ification"]woodeter ]RANT ]; ][].RMRMwowotask]:RM ‎ ]]. ];wowohellode][].odebat ]bat ], RomneywoRANT ];martray‎>>>> ‎hellode ][RM][].":-odeodeodewohellCmdtaskwoode']wotaskwoRMRM ‎wohell ‎RMRMasksRMtaskhell] ‎ ‎ Posted ‎ ];rayRModeaskingaskingraytaskwo ‎]:wowo":-":-taskaskaskitywo]}raytask ‎ ‎] ][ ]] Posted Posted ‎GTateral ‎gd ][


main:     n_decoded =      199
main:     load time =   343.22 ms
main:   sample time =    83.01 ms
main:  predict time =  3197.49 ms
main:    total time =  3732.56 ms
ggml_metal_free: deallocating

CPU inference works fine.

Was just playing with ggml and thought that opening this issue made sense. Nothing important. Thank you!