Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.

Home Page:https://llamafile.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

terminating due to std::out_of_range

Shake-Shifter opened this issue · comments

Trying to run with magicoder weights. It appears to fire up fine but the seconds I send a msg, ask a question, it crashes with the line: "libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found"
This is what I'm getting:

C:\Users\shake>.\llamafile.exe -m magicoder-s-ds-6.7b.Q5_K_M.gguf -ngl 9999
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: amdclang++.exe not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++.exe does not exist
get_rocm_bin_path: note: /opt/rocm/bin/amdclang++.exe does not exist
get_rocm_bin_path: note: clang++.exe not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/clang++.exe does not exist
get_rocm_bin_path: note: /opt/rocm/bin/clang++.exe does not exist
link_cuda_dso: note: dynamically linking /C/Users/shake/.llamafile/ggml-rocm.dll
link_cuda_dso: warning: library not found: failed to load library
link_cuda_dso: note: dynamically linking /C/Users/shake/.llamafile/ggml-cuda.dll
ggml_cuda_link: welcome to CUDA SDK with tinyBLAS
link_cuda_dso: GPU support loaded
{"build":1500,"commit":"a30b324","function":"server_cli","level":"INFO","line":2839,"msg":"build info","tid":"8545344","timestamp":1713685409}
{"function":"server_cli","level":"INFO","line":2842,"msg":"system info","n_threads":6,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | ","tid":"8545344","timestamp":1713685409,"total_threads":12}
llama_model_loader: loaded meta data with 25 key-value pairs and 291 tensors from magicoder-s-ds-6.7b.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = source
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 16384
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 6: llama.use_parallel_residual bool = true
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 11: llama.rope.freq_base f32 = 100000.000000
llama_model_loader: - kv 12: llama.rope.scaling.type str = linear
llama_model_loader: - kv 13: llama.rope.scaling.factor f32 = 4.000000
llama_model_loader: - kv 14: tokenizer.ggml.model str = deepseek_coder
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,32256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,31757] = ["─á ─á", "─á t", "─á a", "i n", "h e...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 32013
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 32014
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 32014
llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 23: general.quantization_version u32 = 2
llama_model_loader: - kv 24: general.file_type u32 = 17
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q5_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: unknown tokenizer: 'deepseek_coder'llm_load_vocab: using default tokenizer: 'llama'llm_load_vocab: SPM vocabulary, but newline token not found: unordered_map::at: key not found! Using special_pad_id instead.llm_load_vocab: mismatch in special tokens definition ( 243/32256 vs 256/32256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32256
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 16384
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 4096
llm_load_print_meta: n_embd_v_gqa = 4096
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 100000.0
llm_load_print_meta: freq_scale_train = 0.25
llm_load_print_meta: n_yarn_orig_ctx = 16384
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 6.74 B
llm_load_print_meta: model size = 4.46 GiB (5.68 BPW)
llm_load_print_meta: general.name = source
llm_load_print_meta: BOS token = 32013 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: UNK token = 0 '!'
llm_load_print_meta: PAD token = 32014 '<|end▁of▁sentence|>'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 2070 Super, compute capability 7.5, VMM: yes
llm_load_tensors: ggml ctx size = 0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 86.62 MiB
llm_load_tensors: CUDA0 buffer size = 4475.76 MiB
..................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 100000.0
llama_new_context_with_model: freq_scale = 0.25
llama_kv_cache_init: CUDA0 KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 71.00 MiB
llama_new_context_with_model: CUDA0 compute buffer size = 71.00 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 9.00 MiB
llama_new_context_with_model: graph nodes = 1060
llama_new_context_with_model: graph splits = 2
{"function":"initialize","level":"INFO","line":481,"msg":"initializing slots","n_slots":1,"tid":"8545344","timestamp":1713685413}
{"function":"initialize","level":"INFO","line":490,"msg":"new slot","n_ctx_slot":512,"slot_id":0,"tid":"8545344","timestamp":1713685413}
{"function":"server_cli","level":"INFO","line":3060,"msg":"model loaded","tid":"8545344","timestamp":1713685413}

llama server listening at http://127.0.0.1:8080

opening browser tab... (pass --nobrowser to disable)
failed to open http://127.0.0.1:8080/ in a browser tab using /c/windows/explorer.exe: process exited with non-zero status
{"function":"server_cli","hostname":"127.0.0.1","level":"INFO","line":3183,"msg":"HTTP server listening","port":"8080","tid":"8545344","timestamp":1713685413}
{"function":"update_slots","level":"INFO","line":1619,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"8545344","timestamp":1713685413}
{"function":"log_server_request","level":"INFO","line":2764,"method":"GET","msg":"request","params":{},"path":"/","remote_addr":"","remote_port":-1,"status":200,"tid":"17594334545616","timestamp":1713685413}
{"function":"log_server_request","level":"INFO","line":2764,"method":"GET","msg":"request","params":{},"path":"/completion.js","remote_addr":"","remote_port":-1,"status":200,"tid":"17594334541424","timestamp":1713685413}
{"function":"log_server_request","level":"INFO","line":2764,"method":"GET","msg":"request","params":{},"path":"/index.js","remote_addr":"","remote_port":-1,"status":200,"tid":"17594334545616","timestamp":1713685413}
{"function":"log_server_request","level":"INFO","line":2764,"method":"GET","msg":"request","params":{},"path":"/json-schema-to-grammar.mjs","remote_addr":"","remote_port":-1,"status":200,"tid":"17594334546912","timestamp":1713685413}
{"function":"launch_slot_with_data","level":"INFO","line":871,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"8545344","timestamp":1713685432}
libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found

error: Uncaught SIGABRT (SI_TKILL) at 0 on MSI pid 1444 tid 21716
./llamafile.exe
Protocol not available
Windows Cosmopolitan 3.3.3 MODE=x86_64 MSI 0.0-0

RAX 0000000000000000 RBX 0000000000000006 RDI 00007000007da9b0
RCX 0000000000704e90 RDX 0000000000000000 RSI 00000000fffffffa
RBP 00007000007dad00 RSP 00007000007da890 RIP 000000000041af82
R8 0000000000000000 R9 0000000000000000 R10 0000000000000000
R11 0000000000000246 R12 00001000801f3b10 R13 00000000006196c0
R14 00000000006c9ed8 R15 00001000801f21c8
TLS 0000000000704e40

XMM0 00000000000000000000000000000000 XMM8 00000000000000000000000000000000
XMM1 00000000000000000000000000000000 XMM9 00000000000000000000000000000000
XMM2 00000000000000000000000000000000 XMM10 00000000000000000000000000000000
XMM3 00000000000000000000000000000000 XMM11 00000000000000000000000000000000
XMM4 00000000000000000000000000000000 XMM12 00000000000000000000000000000000
XMM5 00000000000000000000000000000000 XMM13 00000000000000000000000000000000
XMM6 00000000000000000000000000000000 XMM14 00000000000000000000000000000000
XMM7 618196e273698196e2736968548196e2 XMM15 00000000000000000000000000000000

cosmoaddr2line /C/Users/shake/llamafile.exe 41af82 64c033 4147bf 661a3f 661bc2 62c607 62c029 413a36 5283ad 56a3c1 56a751 530875 530c5a 513d63 513ec3 45aa4a 48c16d 48abc5 43fadc 401b81 410a03 658183

7000007d83c0 41af82 __sig_raise+50
7000007dad00 64c033 raise+83
7000007dad20 4147bf abort+45
7000007dad40 661a3f NULL+0
7000007dae30 661bc2 _ZL28demangling_terminate_handlerv+338
7000007daee0 62c607 _ZSt11__terminatePFvvE+71
7000007daf60 62c029 NULL+0
7000007daf90 413a36 _ZNSt3__120__throw_out_of_rangeEPKc+122
7000007db020 5283ad _ZL19llama_byte_to_tokenRK11llama_vocabh+433
7000007db0e0 56a3c1 _ZN17llm_tokenizer_spm9resegmentER10llm_symbolRNSt3__16vectorIiNS2_9allocatorIiEEEE+579
7000007db1d0 56a751 _ZN17llm_tokenizer_spm8tokenizeERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEERNS0_6vectorIiNS4_IiEEEE+759
7000007db270 530875 _ZL23llama_tokenize_internalRK11llama_vocabNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEbb+1479
7000007db400 530c5a llama_tokenize+190
7000007db4d0 513d63 _Z14llama_tokenizePK11llama_modelRKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEbb+313
7000007db5a0 513ec3 _Z14llama_tokenizePK13llama_contextRKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEEbb+55
7000007db5e0 45aa4a _ZNK20llama_server_context8tokenizeERKN8nlohmann16json_abi_v3_11_310basic_jsonINSt3__13mapENS3_6vectorENS3_12basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEEblmdS9_NS1_14adl_serializerENS5_IhNS9_IhEEEEvEEb+554
7000007db710 48c16d _ZN20llama_server_context12update_slotsEv.isra.0+5089
7000007dbaf0 48abc5 _ZN18llama_server_queue10start_loopEv+1405
7000007dbc40 43fadc _Z10server_cliiPPc+10250
7000007dcf60 401b81 main+345
7000007deec0 410a03 cosmo+73
7000007deed0 658183 __stack_call+18

./llamafile.exe -m magicoder-s-ds-6.7b.Q5_K_M.gguf -ngl 9999

Would I appreciate any help, been having a heck of a time trying to get one of these up and running on a few different platforms now, fingers crossed on this one