[Bug] User side cancellation

Question

[Bug] User side cancellation

sunggg opened this issue 9 months ago · comments

User side cancellation does not take effect. We also need to log properly when it has been cancelled.

Lite Ye · Answer 1 · Thu Feb 01 2024 05:01:51 GMT+0800 (China Standard Time)

How is the user side cancellation triggered? When I tried by ctrl-c a running curl command, I can see the cancellation gets processed.

script:

payload='{                                                                                                                                                                                                 
  "model": "llama-2",                                                                                                                                                                                      
  "messages": [                                                                                                                                                                                            
      {                                                                                                                                                                                                    
        "role": "user",                                                                                                                                                                                    
        "content": "Hello! what is the answer to life, the universe, and everything? give me a long answer"                                                                                                
      }                                                                                                                                                                                                    
    ],                                                                                                                                                                                                     
  "max_tokens": 1000,                                                                                                                                                                                      
  "stream": true,                                                                                                                                                                                          
  "temperature": 1.0,                                                                                                                                                                                      
  "top_p": 1,                                                                                                                                                                                              
  "presence_penalty": 0,                                                                                                                                                                                   
  "frequency_penalty": 0                                                                                                                                                                                   
}'                                                                                                                                                                                                         
                                                                                                                                                                                                           
echo "======="                                                                                                                                                                                             
echo "Request"                                                                                                                                                                                             
echo "======="                                                                                                                                                                                             
echo "$payload" | jq                                                                                                                                                                                       
                                                                                                                                                                                                           
echo "========"                                                                                                                                                                                            
echo "Response"                                                                                                                                                                                            
echo "========"                                                                                                                                                                                            
                                                                                                                                                                                                           
curl -s -X 'POST' \                                                                                                                                                                                        
  'http://127.0.0.1:8000/v1/chat/completions' \                                                                                                                                                            
  -H 'accept: application/json' \                                                                                                                                                                          
  -H 'Content-Type: application/json' \                                                                                                                                                                    
  -H "Authorization: Bearer abc" \                                                                                                                                                                         
  -d "$payload"

log:

2024-01-31 20:58:40 [info     ] StagingInferenceEngine.add     [mlc_serve.engine.staging_engine] func_name=add lineno=106 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/staging_engine.py process=2803754 requests=[Request(request_id='cmpl-71e9e27ce9f842108e3e820b1b6d63c8', messages=[ChatMessage(role='user', content='Hello! what is the answer to life, the universe, and everything? give me a long answer')], num_sequences=1, best_of=1, sampling_params=SamplingParams(presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, logit_bias=None, appeared_tokens_freq={}, logit_bias_index=None, logit_bias_value=None, logprobs=False, top_logprobs=0), stopping_criteria=StoppingCriteria(max_tokens=1000, stop_sequences=[]), debug_options=DebugOptions(ignore_eos=False, prompt=None, prompt_token_ids=None), validate_tokens=None, contextvars={})]
2024-01-31 20:58:40 [info     ] AsyncEngineConnector.generate iterator cancelled. [mlc_serve.engine.async_connector] func_name=generate lineno=90 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/async_connector.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8
2024-01-31 20:58:40 [info     ] StagingInferenceEngine.cancel  [mlc_serve.engine.staging_engine] func_name=cancel lineno=133 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/staging_engine.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8
2024-01-31 20:58:40 [info     ] AsyncEngineConnector.generate request sucessfully cancelled. [mlc_serve.engine.async_connector] func_name=generate lineno=93 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/async_connector.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8
2024-01-31 20:58:40 [info     ] AsyncEngineConnector.generate removing request from result queue. [mlc_serve.engine.async_connector] func_name=generate lineno=98 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/async_connector.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8

Sunghyun Park · Answer 2 · Thu Feb 01 2024 05:06:47 GMT+0800 (China Standard Time)

Hmm interesting. That is pretty much what I did. I was printing the all the token_ids and saw it kept printing with new tokens even after cancellation. Is it possible that the request is cancelled correctly but somehow keep printing from the buffer?

Lite Ye · Answer 3 · Fri Feb 02 2024 00:53:05 GMT+0800 (China Standard Time)

Is it possible that the request is cancelled correctly but somehow keep printing from the buffer?

No it's not. If it's cancelled correctly, it shouldn't be able to print new tokens.

Can you show me your steps to trigger the problem? Then I can try to reproduce it on my side.