Replies occationally being "swallowed" due to abnormal API stream completion.

Question

Replies occationally being "swallowed" due to abnormal API stream completion.

xmoiduts opened this issue 4 months ago · comments

xmoiduts commented 4 months ago

Also refer to #91 @Arro

I have been investigating an intermittent issue with SlickGPT where reply messages sometimes vanish unexpectedly.

Details:

I send question message A
When a reply message (message A') finishes streaming, the entire message box disappears instantly.
If another chat message B is sent after this occurs, the premature (blue) message box reappears, containing the content of the previous reply (message A'). The new reply (message B') then starts streaming after message A' 's content within the same box. And, B doesn't need to be within the same session of A, switching to another session and send B will also summon the premature box containing A'.
Once the new reply (message B') finishes streaming, the message box vanishes instantly again upon completion.

Debugging Observations: I used Chrome's DevTools to observe the API response streams for both the normal and problematic reply messages. Here's what I found: (note: self-deployed instance, not OpenAI official API endpoint)
Normal stream example:

message	{"id":"chatcmpl-K6xxx2J","object":"chat.completion.chunk","created":1712327959,"model":"xxx","choices":[{"index":0,"delta":{"content":" with"},"finish_reason":null}],"system_fingerprint":"fp_60xxb3"}	
22:39:26.771
message	{"id":"chatcmpl-K6xxx2J","object":"chat.completion.chunk","created":1712327959,"model":"xxx","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}],"system_fingerprint":"fp_60xxb3"}	
22:39:26.771
message	{"id":"chatcmpl-K6xxx2J","object":"chat.completion.chunk","created":1712327959,"model":"xxx","choices":[{"index":0,"delta":{"content":""},"finish_reason":"stop"}],"system_fingerprint":"fp_60xxb3"}	
22:39:26.771
message	[DONE]

Problematic stream example:

message	{"id":"chatcmpl-K6xxx2J","object":"chat.completion.chunk","created":1712327959,"model":"xxx","choices":[{"index":0,"delta":{"content":" with"},"finish_reason":null}],"system_fingerprint":"fp_60xxb3"}	
22:39:26.771
message	{"id":"chatcmpl-K6xxx2J","object":"chat.completion.chunk","created":1712327959,"model":"xxx","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}],"system_fingerprint":"fp_60xxb3"}	
22:39:26.771
message	{"id":"chatcmpl-K6xxx2J","object":"chat.completion.chunk","created":1712327959,"model":"xxx","choices":[{"index":0,"delta":{"content":""},"finish_reason":null}],"system_fingerprint":"fp_60xxb3"}	
22:39:26.771
message	[DONE]

In the problematic stream, the last chunk before the [DONE] message has finish_reason set to null instead of "stop".
Also, when this happens, chrome console reports that it cannot parse [DONE] to json.

Inferred Cause:

SlickGPT likely relies on the finish_reason being set to "stop" to determine when a reply has fully streamed and should be displayed as a completed message. When the API response stream does not correctly set finish_reason to "stop", SlickGPT treats the reply as incomplete and does not properly retain the message box.

More notes: As I can't access OpenAI official API now, I use another translation API that forwards OpenAI API call scheme to OpenAI and other models like claude (offtopic: yes, with translation API and customized endpoint, the vanilla slickgpt clone can access claude). Its behavior may be different from the official GPT API.

But as the issue has already been observed during my usage of vercel slickgpt, I suspect this behavior to cause this issue.

My suggestion:

As I'm no web development or API call expert, I would only 'guess' how to solve the issue:

Use [DONE] as the indicator of AI reply ending;
or, add a "Retrieve Generation" button as the mitigation when the problematic reply stream happens, which stops waiting for more AI reply chunks, and truncate the message to "as is".
See how other web GPT chat bots deal with API abnormalties. I haven't debugged but https://github.com/Bin-Huang/chatbox didn't seem to swallow messages with the same translated API providers during my debug of slickGPT.

Simon Hopstätter · Answer 1 · Sat Apr 06 2024 02:45:16 GMT+0800 (China Standard Time)

Hi and thanks for this comprehensive report!
I think it's exactly what you think it is. Apparently, with your setup the end of the completion is sometimes inconsistent and SlickGPT can't handle that. The function in question is in src/lib/ChatInput.svelte and looks like this:

function handleAnswer(event: MessageEvent<any>) {
	try {
		if ($isPro) {
			// irrelevant for your case
		} else {
			const completionResponse: any = JSON.parse(event.data);
			const isFinished = completionResponse.choices[0].finish_reason === 'stop';
			if (event.data !== '[DONE]' && !isFinished) {
				const delta: string = completionResponse.choices[0].delta.content || '';
				showLiveResponse(delta);
			} else {
				addCompletionToChat();
			}
		}
	} catch (err) {
	        // this will remove the message from the chat view if hit
		handleError(err);
	}
}

So apparently, something throws an error in the problematic cases and I'd like to know where exactly. Do you think you could debug this? This is probably easier then me trying to replicate your setup. It's pretty easy:

install NodeJS
checkout the repository
in the repo root run npm i
create .env and set the vars (you can take the ones from your self-hosted installation or .env.example as base, most of them can be empty for our test)
run npm run dev

Because you're now running from source (including source maps etc.), you can now CTRL/CMD+P in the Chrome Dev Tools and put a breakpoint in ChatInput.svelte directly.

If we find a solution for this, I'll be happy to take or create a PR for the fix!

Simon Hopstätter · Answer 2 · Fri May 31 2024 19:31:31 GMT+0800 (China Standard Time)

@xmoiduts any news on this?