fauxpilot / fauxpilot

FauxPilot - an open-source alternative to GitHub Copilot server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

See multiple warning when loading codegen weights

vmurahari3 opened this issue · comments

I am trying to use the codegen 2B model and I am able to get the triton server running. However, I see multiple warnings (posted below) about loading model weights. I was wondering if this is normal?

I0101 19:42:05.171324 37293 libfastertransformer.cc:1320] TRITONBACKEND_ModelInstanceInitialize: fastertransformer_0 (device 0)
W0101 19:42:05.171344 37293 libfastertransformer.cc:453] Faster transformer model instance is created at GPU '0'
W0101 19:42:05.171349 37293 libfastertransformer.cc:459] Model name codegen-2B-multi
W0101 19:42:05.171360 37293 libfastertransformer.cc:578] Get input name: input_ids, type: TYPE_UINT32, shape: [-1]
W0101 19:42:05.171366 37293 libfastertransformer.cc:578] Get input name: start_id, type: TYPE_UINT32, shape: [1]
W0101 19:42:05.171370 37293 libfastertransformer.cc:578] Get input name: end_id, type: TYPE_UINT32, shape: [1]
W0101 19:42:05.171375 37293 libfastertransformer.cc:578] Get input name: input_lengths, type: TYPE_UINT32, shape: [1]
W0101 19:42:05.171380 37293 libfastertransformer.cc:578] Get input name: request_output_len, type: TYPE_UINT32, shape: [-1]
W0101 19:42:05.171384 37293 libfastertransformer.cc:578] Get input name: runtime_top_k, type: TYPE_UINT32, shape: [1]
W0101 19:42:05.171389 37293 libfastertransformer.cc:578] Get input name: runtime_top_p, type: TYPE_FP32, shape: [1]
W0101 19:42:05.171393 37293 libfastertransformer.cc:578] Get input name: beam_search_diversity_rate, type: TYPE_FP32, shape: [1]
W0101 19:42:05.171398 37293 libfastertransformer.cc:578] Get input name: temperature, type: TYPE_FP32, shape: [1]
W0101 19:42:05.171402 37293 libfastertransformer.cc:578] Get input name: len_penalty, type: TYPE_FP32, shape: [1]
W0101 19:42:05.171406 37293 libfastertransformer.cc:578] Get input name: repetition_penalty, type: TYPE_FP32, shape: [1]
W0101 19:42:05.171411 37293 libfastertransformer.cc:578] Get input name: random_seed, type: TYPE_INT32, shape: [1]
W0101 19:42:05.171415 37293 libfastertransformer.cc:578] Get input name: is_return_log_probs, type: TYPE_BOOL, shape: [1]
W0101 19:42:05.171419 37293 libfastertransformer.cc:578] Get input name: beam_width, type: TYPE_UINT32, shape: [1]
W0101 19:42:05.171425 37293 libfastertransformer.cc:578] Get input name: bad_words_list, type: TYPE_INT32, shape: [2, -1]
W0101 19:42:05.171429 37293 libfastertransformer.cc:578] Get input name: stop_words_list, type: TYPE_INT32, shape: [2, -1]
W0101 19:42:05.171437 37293 libfastertransformer.cc:620] Get output name: output_ids, type: TYPE_UINT32, shape: [-1, -1]
W0101 19:42:05.171442 37293 libfastertransformer.cc:620] Get output name: sequence_length, type: TYPE_UINT32, shape: [-1]
W0101 19:42:05.171447 37293 libfastertransformer.cc:620] Get output name: cum_log_probs, type: TYPE_FP32, shape: [-1]
W0101 19:42:05.171452 37293 libfastertransformer.cc:620] Get output name: output_log_probs, type: TYPE_FP32, shape: [-1, -1]
[FT][WARNING] Custom All Reduce only supports 8 Ranks currently. Using NCCL as Comm.
I0101 19:42:05.499549 37293 libfastertransformer.cc:307] Before Loading Model:
after allocation, free 15.20 GB total 15.75 GB
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.wte.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.final_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.final_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.lm_head.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.lm_head.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.0.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.0.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.0.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.0.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.0.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.0.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.0.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.0.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.1.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.1.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.1.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.1.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.1.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.1.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.1.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.1.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.2.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.2.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.2.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.2.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.2.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.2.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.2.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.2.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.3.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.3.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.3.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.3.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.3.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.3.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.3.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.3.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.4.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.4.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.4.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.4.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.4.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.4.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.4.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.4.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.5.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.5.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.5.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.5.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.5.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.5.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.5.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.5.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.6.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.6.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.6.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.6.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.6.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.6.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.6.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.6.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.7.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.7.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.7.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.7.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.7.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.7.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.7.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.7.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.8.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.8.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.8.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.8.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.8.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.8.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.8.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.8.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.9.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.9.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.9.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.9.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.9.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.9.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.9.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.9.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.10.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.10.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.10.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.10.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.10.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.10.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.10.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.10.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.11.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.11.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.11.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.11.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.11.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.11.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.11.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.11.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.12.input_layernorm.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.12.input_layernorm.weight.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.12.attention.query_key_value.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.12.attention.dense.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.12.mlp.dense_h_to_4h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.12.mlp.dense_h_to_4h.bias.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.12.mlp.dense_4h_to_h.weight.0.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.12.mlp.dense_4h_to_h.bias.bin cannot be opened, loading model fails! 

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.layers.13.input_layernorm.bias.bin cannot be o

[FT][WARNING] file /model/fastertransformer/1/1-gpu/model.wte.bin cannot be opened, loading model fails!

Which version of FT (FasterTransformer) is installed on your computer? In my experience, similiar problems have occurred when utilizing older versions.

I had a similar issue. I used the Faster transformation V1.2.0

Are you using Docker? I just ran the latest version and do not see these warnings.

OS: Ubuntu 22.04.1 LTS
Docker: 23.0.1
GPU: Nvidia 2080ti

@vmurahari3,
I had a similar issue. In my case 2B worked okay, but 6B was the problem.
I figured out that the 6B model was not completely downloaded due to the lack of disk space.
After redownloading the 6B model with enough disk space, 6B worked fine for me.

Yeah, this is a weird behavior of Triton where it fails to load weights but does not fail to launch the server - what this tells you is that either you've misconfigured the path to where the model weights are or that the weight files just aren't there

@vmurahari3 I also met the same problem. Have you solved it?