fauxpilot / fauxpilot

FauxPilot - an open-source alternative to GitHub Copilot server

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fastertransformer not available

Doonut opened this issue · comments

on my arch linux host, with docker running locally, I get

WARNING: Model 'fastertransformer' is not available. Please ensure that `model` is set to either 'fastertransformer' or 'py-model' depending on your installation

from the copilot proxy when connecting with my local instance of vscode.
I've tried using both different backends, and my system has a 3090. I used the 6P model when trying fastertransformer backend and the 2B model when trying the python backend.
I am mildly experienced in docker-compose and also tried passing model directly to the proxy as an env variable to no avail.

Hello there, thanks for opening your first issue. We welcome you to the FauxPilot community!

Could you paste the full log?

Not OP, but getting the same issue here:
FauxPilotIssue.txt

commented

Same issue at line 2. Wait until model ready, it may take a while.

fauxpilot-copilot_proxy-1  | [StatusCode.UNAVAILABLE] failed to connect to all addresses
fauxpilot-copilot_proxy-1  | WARNING: Model 'fastertransformer' is not available. Please ensure that `model` is set to either 'fastertransformer' or 'py-model' depending on your installation
fauxpilot-copilot_proxy-1  | Returned completion in 1.4257431030273438 ms
fauxpilot-copilot_proxy-1  | INFO:     2023-04-15 15:09:53,893 :: 100.105.61.13:59652 - "POST /v1/engines/codegen/completions HTTP/1.1" 200 OK
fauxpilot-triton-1         | I0415 15:10:10.021414 89 libfastertransformer.cc:321] After Loading Model:
fauxpilot-triton-1         | after allocation, free 17.82 GB total 31.74 GB
fauxpilot-triton-1         | I0415 15:10:10.021800 89 libfastertransformer.cc:537] Model instance is created on GPU Tesla V100-SXM2-32GB
fauxpilot-triton-1         | I0415 15:10:10.022000 89 model_repository_manager.cc:1345] successfully loaded 'fastertransformer' version 1
fauxpilot-triton-1         | I0415 15:10:10.022091 89 server.cc:556]
fauxpilot-triton-1         | +------------------+------+
fauxpilot-triton-1         | | Repository Agent | Path |
fauxpilot-triton-1         | +------------------+------+
fauxpilot-triton-1         | +------------------+------+
fauxpilot-triton-1         |
fauxpilot-triton-1         | I0415 15:10:10.022142 89 server.cc:583]
fauxpilot-triton-1         | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
fauxpilot-triton-1         | | Backend           | Path                                                                        | Config                                                                                                                                                         |
fauxpilot-triton-1         | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
fauxpilot-triton-1         | | fastertransformer | /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
fauxpilot-triton-1         | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
fauxpilot-triton-1         |
fauxpilot-triton-1         | I0415 15:10:10.022178 89 server.cc:626]
fauxpilot-triton-1         | +-------------------+---------+--------+
fauxpilot-triton-1         | | Model             | Version | Status |
fauxpilot-triton-1         | +-------------------+---------+--------+
fauxpilot-triton-1         | | fastertransformer | 1       | READY  |
fauxpilot-triton-1         | +-------------------+---------+--------+

After it shows ready, you still need to take some time for it to become available(no tip). And then it works.

fauxpilot-copilot_proxy-1  | INFO:     2023-04-15 15:14:01,120 :: 100.105.61.13:37942 - "POST /v1/engines/codegen/completions HTTP/1.1" 200 OK
fauxpilot-triton-1         | W0415 15:17:33.322818 89 libfastertransformer.cc:1397] model fastertransformer, instance fastertransformer_0, executing 1 requests
fauxpilot-triton-1         | W0415 15:17:33.322852 89 libfastertransformer.cc:638] TRITONBACKEND_ModelExecute: Running fastertransformer_0 with 1 requests
fauxpilot-triton-1         | W0415 15:17:33.322861 89 libfastertransformer.cc:693] get total batch_size = 1
fauxpilot-triton-1         | W0415 15:17:33.322874 89 libfastertransformer.cc:1051] get input count = 16
fauxpilot-triton-1         | W0415 15:17:33.322892 89 libfastertransformer.cc:1117] collect name: start_id size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.322903 89 libfastertransformer.cc:1117] collect name: input_ids size: 8 bytes
fauxpilot-triton-1         | W0415 15:17:33.322914 89 libfastertransformer.cc:1117] collect name: bad_words_list size: 8 bytes
fauxpilot-triton-1         | W0415 15:17:33.322924 89 libfastertransformer.cc:1117] collect name: random_seed size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.322935 89 libfastertransformer.cc:1117] collect name: end_id size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.322945 89 libfastertransformer.cc:1117] collect name: input_lengths size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.322955 89 libfastertransformer.cc:1117] collect name: request_output_len size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.322965 89 libfastertransformer.cc:1117] collect name: runtime_top_k size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.322974 89 libfastertransformer.cc:1117] collect name: runtime_top_p size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.322984 89 libfastertransformer.cc:1117] collect name: is_return_log_probs size: 1 bytes
fauxpilot-triton-1         | W0415 15:17:33.322992 89 libfastertransformer.cc:1117] collect name: stop_words_list size: 24 bytes
fauxpilot-triton-1         | W0415 15:17:33.323003 89 libfastertransformer.cc:1117] collect name: temperature size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.323012 89 libfastertransformer.cc:1117] collect name: len_penalty size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.323021 89 libfastertransformer.cc:1117] collect name: beam_width size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.323032 89 libfastertransformer.cc:1117] collect name: beam_search_diversity_rate size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.323042 89 libfastertransformer.cc:1117] collect name: repetition_penalty size: 4 bytes
fauxpilot-triton-1         | W0415 15:17:33.323050 89 libfastertransformer.cc:1130] the data is in CPU
fauxpilot-triton-1         | W0415 15:17:33.323058 89 libfastertransformer.cc:1137] the data is in CPU
fauxpilot-triton-1         | W0415 15:17:33.323075 89 libfastertransformer.cc:999] before ThreadForward 0
fauxpilot-triton-1         | W0415 15:17:33.323133 89 libfastertransformer.cc:1006] after ThreadForward 0
fauxpilot-triton-1         | I0415 15:17:33.323163 89 libfastertransformer.cc:834] Start to forward
fauxpilot-triton-1         | I0415 15:17:33.585172 89 libfastertransformer.cc:836] Stop to forward
fauxpilot-triton-1         | W0415 15:17:33.585278 89 libfastertransformer.cc:1161] Get output_tensors 0: output_ids
fauxpilot-triton-1         | W0415 15:17:33.585301 89 libfastertransformer.cc:1171]     output_type: UINT32
fauxpilot-triton-1         | W0415 15:17:33.585313 89 libfastertransformer.cc:1191]     output shape: [1, 1, 102]
fauxpilot-triton-1         | W0415 15:17:33.585391 89 libfastertransformer.cc:1161] Get output_tensors 1: sequence_length
fauxpilot-triton-1         | W0415 15:17:33.585400 89 libfastertransformer.cc:1171]     output_type: INT32
fauxpilot-triton-1         | W0415 15:17:33.585408 89 libfastertransformer.cc:1191]     output shape: [1, 1]
fauxpilot-triton-1         | W0415 15:17:33.585439 89 libfastertransformer.cc:1206] PERFORMED GPU copy: NO
fauxpilot-triton-1         | W0415 15:17:33.585456 89 libfastertransformer.cc:780] get response size = 1
fauxpilot-triton-1         | W0415 15:17:33.585628 89 libfastertransformer.cc:795] response is sent
fauxpilot-copilot_proxy-1  | Returned completion in 264.94908332824707 ms
fauxpilot-copilot_proxy-1  | INFO:     2023-04-15 15:17:33,586 :: 172.19.0.1:47588 - "POST /v1/engines/codegen/completions HTTP/1.1" 200 OK

I mean, I'm not sure how long you're expected to leave it, but it doesn't look like it's getting far after 15 minutes.

fauxpilot-windows-main-triton-1         | [FT][WARNING] Custom All Reduce only supports 8 Ranks currently. Using NCCL as Comm.
fauxpilot-windows-main-triton-1         | after allocation, free 10.56 GB total 12.00 GB
fauxpilot-windows-main-triton-1         | [WARNING] gemm_config.in is not found; using default GEMM algo
fauxpilot-windows-main-triton-1         | I0415 15:35:30.032622 88 libfastertransformer.cc:321] After Loading Model:
fauxpilot-windows-main-triton-1         | after allocation, free 4.95 GB total 12.00 GB
fauxpilot-windows-main-triton-1         | I0415 15:35:30.032867 88 libfastertransformer.cc:537] Model instance is created on GPU NVIDIA GeForce RTX 3080 Ti
fauxpilot-windows-main-triton-1         | I0415 15:35:30.033259 88 model_repository_manager.cc:1345] successfully loaded 'fastertransformer' version 1
fauxpilot-windows-main-triton-1         | I0415 15:35:30.033333 88 server.cc:556]
fauxpilot-windows-main-triton-1         | +------------------+------+
fauxpilot-windows-main-triton-1         | | Repository Agent | Path |
fauxpilot-windows-main-triton-1         | +------------------+------+
fauxpilot-windows-main-triton-1         | +------------------+------+
fauxpilot-windows-main-triton-1         |
fauxpilot-windows-main-triton-1         | I0415 15:35:30.033379 88 server.cc:583]
fauxpilot-windows-main-triton-1         | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
fauxpilot-windows-main-triton-1         | | Backend           | Path                                                                        | Config                                                                                                                                                         |
fauxpilot-windows-main-triton-1         | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
fauxpilot-windows-main-triton-1         | | fastertransformer | /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
fauxpilot-windows-main-triton-1         | +-------------------+-----------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
fauxpilot-windows-main-triton-1         |
fauxpilot-windows-main-triton-1         | I0415 15:35:30.033630 88 server.cc:626]
fauxpilot-windows-main-triton-1         | +-------------------+---------+--------+
fauxpilot-windows-main-triton-1         | | Model             | Version | Status |
fauxpilot-windows-main-triton-1         | +-------------------+---------+--------+
fauxpilot-windows-main-triton-1         | | fastertransformer | 1       | READY  |
fauxpilot-windows-main-triton-1         | +-------------------+---------+--------+
fauxpilot-windows-main-triton-1         |
fauxpilot-windows-main-triton-1         | I0415 15:35:30.045193 88 metrics.cc:650] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3080 Ti
fauxpilot-windows-main-triton-1         | I0415 15:35:30.045348 88 tritonserver.cc:2159]
fauxpilot-windows-main-triton-1         | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
fauxpilot-windows-main-triton-1         | | Option                           | Value                                                                                                                                                                                        |
fauxpilot-windows-main-triton-1         | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
fauxpilot-windows-main-triton-1         | | server_id                        | triton                                                                                                                                                                                       |
fauxpilot-windows-main-triton-1         | | server_version                   | 2.23.0                                                                                                                                                                                       |
fauxpilot-windows-main-triton-1         | | server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
fauxpilot-windows-main-triton-1         | | model_repository_path[0]         | /model                                                                                                                                                                                       |
fauxpilot-windows-main-triton-1         | | model_control_mode               | MODE_NONE                                                                                                                                                                                    |
fauxpilot-windows-main-triton-1         | | strict_model_config              | 1                                                                                                                                                                                            |
fauxpilot-windows-main-triton-1         | | rate_limit                       | OFF                                                                                                                                                                                          |
fauxpilot-windows-main-triton-1         | | pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |
fauxpilot-windows-main-triton-1         | | cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                     |
fauxpilot-windows-main-triton-1         | | response_cache_byte_size         | 0                                                                                                                                                                                            |
fauxpilot-windows-main-triton-1         | | min_supported_compute_capability | 6.0                                                                                                                                                                                          |
fauxpilot-windows-main-triton-1         | | strict_readiness                 | 1                                                                                                                                                                                            |
fauxpilot-windows-main-triton-1         | | exit_timeout                     | 30                                                                                                                                                                                           |
fauxpilot-windows-main-triton-1         | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
fauxpilot-windows-main-triton-1         |
fauxpilot-windows-main-triton-1         | I0415 15:35:30.049895 88 grpc_server.cc:4587] Started GRPCInferenceService at 0.0.0.0:8001
fauxpilot-windows-main-triton-1         | I0415 15:35:30.050969 88 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
fauxpilot-windows-main-triton-1         | I0415 15:35:30.137683 88 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
fauxpilot-windows-main-copilot_proxy-1  | [StatusCode.UNAVAILABLE] failed to connect to all addresses
fauxpilot-windows-main-copilot_proxy-1  | WARNING: Model 'fastertransformer' is not available. Please ensure that `model` is set to either 'fastertransformer' or 'py-model' depending on your installation
fauxpilot-windows-main-copilot_proxy-1  | Returned completion in 4.260063171386719 ms
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 15:49:30,615 :: 172.18.0.1:50712 - "POST /v1/engines/codegen/completions HTTP/1.1" 200 OK
fauxpilot-windows-main-copilot_proxy-1  | [StatusCode.UNAVAILABLE] failed to connect to all addresses
fauxpilot-windows-main-copilot_proxy-1  | WARNING: Model 'fastertransformer' is not available. Please ensure that `model` is  set to either 'fastertransformer' or 'py-model' depending on your installation
fauxpilot-windows-main-copilot_proxy-1  | Returned completion in 3.4265518188476562 ms
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 15:51:35,327 :: 172.18.0.1:49786 - "POST /v1/engines/codegen/completions HTTP/1.1" 200 OK```

Even using the API webpage, on the /codegen/completions endpoint, the "Try it out" execution fails too.

commented

@TheJambo my server configuration is 12c 90g v100, it takes about 1m30s total.

fauxpilot-windows-main-copilot_proxy-1  | INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 16:35:52,778 :: 172.18.0.1:54426 - "GET / HTTP/1.1" 200 OK
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 16:35:53,149 :: 172.18.0.1:54426 - "GET /openapi.json HTTP/1.1" 200 OK
fauxpilot-windows-main-copilot_proxy-1  | [StatusCode.UNAVAILABLE] failed to connect to all addresses
fauxpilot-windows-main-copilot_proxy-1  | WARNING: Model 'fastertransformer' is not available. Please ensure that `model` is set to either 'fastertransformer' or 'py-model' depending on your installation
fauxpilot-windows-main-copilot_proxy-1  | Returned completion in 1.5850067138671875 ms
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 16:35:57,042 :: 172.18.0.1:54426 - "POST /v1/engines/codegen/completions HTTP/1.1" 200 OK
fauxpilot-windows-main-copilot_proxy-1  | [StatusCode.UNAVAILABLE] failed to connect to all addresses
fauxpilot-windows-main-copilot_proxy-1  | WARNING: Model 'fastertransformer' is not available. Please ensure that `model` is set to either 'fastertransformer' or 'py-model' depending on your installation
fauxpilot-windows-main-copilot_proxy-1  | Returned completion in 1.0571479797363281 ms
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 16:36:01,395 :: 172.18.0.1:54426 - "POST /v1/engines/codegen/completions HTTP/1.1" 200 OK
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 16:36:20,462 :: 172.18.0.1:54428 - "GET /copilot_internal/v2/token HTTP/1.1" 200 OK
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 18:10:03,355 :: 172.18.0.1:37994 - "GET / HTTP/1.1" 200 OK
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 18:10:03,724 :: 172.18.0.1:37994 - "GET /openapi.json HTTP/1.1" 200 OK
fauxpilot-windows-main-copilot_proxy-1  | [StatusCode.UNAVAILABLE] failed to connect to all addresses
fauxpilot-windows-main-copilot_proxy-1  | WARNING: Model 'fastertransformer' is not available. Please ensure that `model` is set to either 'fastertransformer' or 'py-model' depending on your installation
fauxpilot-windows-main-copilot_proxy-1  | Returned completion in 8.57996940612793 ms
fauxpilot-windows-main-copilot_proxy-1  | INFO:     2023-04-15 18:10:08,561 :: 172.18.0.1:37994 - "POST /v1/engines/codegen/completions HTTP/1.1" 200 OK

Maybe it can be time based in some instances, but not here! 😆