thrasher-corp / gocryptotrader

A cryptocurrency trading bot and framework supporting multiple exchanges written in Golang.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Binance Websocket Reconnection Deadlock

yangrq1018 opened this issue · comments

New Issue

Context

As far as from what I learned from Slack, exchanges/stream has a Websocket.trafficMonitor method that fires a routine which checks periodically the websocket is being read data from. When a timeout occurs and no data is read from, it attempts to call w.Shutdown(). Shutdown has a piece of code like this

close(w.ShutdownC)
w.Wg.Wait()

This seems to cause a deadlock problem:

  1. Binance.wsReadData is holding Wg. It doesn't check if w.ShutdownC is closed at all in the ininifte loop.
  2. Websocket.trafficMonitor is holding Wg. It waits on itself as w.Wg.Done() is called AFTER w.Shutdown() returns.
  • Operating System:
    Windows 10 x64

  • GoCryptoTrader version (gocryptotrader -version):
    On revision number 3467914

Expected Behavior

I expect w.Shutdown() to return rather than causing a deadlock.

Current Behavior

Besides the deadlock mentioned above, it also blocks Websocket.Connect which tries to acquire Websocket.m that is held by Shutdown. So what i am seeing is:

  1. suppose the websocket is inactive for more than two seconds, exceeding timeout
  2. trafficMonitor detects it and tries to call Shutdown. Shutdown blocks. Even if the user calls Websocket.Connect by himself, it blocks forever too.

I could work on a PR, but I am not very available recently, progress could be slow.

Failure Information (for bugs)

Please help by providing information about the failure. If it is not a bug, please remove the rest of this template.

Steps to Reproduce

config:

{
  "name": "Binance",
   "enabled": true,
   "verbose": false,
   "httpTimeout": 15000000000,
   "websocketResponseCheckTimeout": 30000000,
   "websocketResponseMaxLimit": 7000000000,
   "websocketTrafficTimeout": 2000000000,
   ...
}

Set websocketTrafficTimeout to 2 seconds. Start binance websocket. Don't subscribe to anything (actually I subscribed to order update only but it is very infrequent obviously).

Add some logging to websocket.go

				if !w.IsConnecting() && w.IsConnected() {
					log.Infof(log.WebsocketMgr, "attempt shutdown")
					err := w.Shutdown()
					log.Infof(log.WebsocketMgr, "shutdown returns")
					if err != nil {
						log.Errorf(log.WebsocketMgr,
							"%v websocket: trafficMonitor shutdown err: %s",
							w.exchangeName, err)
					}
				}

See "attempt shutdown" and never "shutdown returns"

Failure Logs

[INFO]  | LOG | 2021/08/24/ 14:07:10 | Logger initialised.
[WARN]  | LOG | 2021/08/24/ 14:07:10 | -tradeprocessinginterval must be >= to 1 second, using default value of 15s
[WARN]  | LOG | 2021/08/24/ 14:07:10 | ExchangeRateHost forex provider not found, adding to config..
[WARN]  | LOG | 2021/08/24/ 14:07:10 | ExchangeRates forex provider no longer supports unset API key requests. Switching to ExchangeRateHost FX provider..
[WARN]  | CONFIG | 2021/08/24/ 14:07:10 | No valid forex providers configured. Defaulting to ExchangeRateHost.
[INFO]  | WEBSOCKET | 2021/08/24/ 14:07:17 | attempt shutdown
[WARN]  | WEBSOCKET | 2021/08/24/ 14:07:17 | Binance websocket has been disconnected. Reason: read tcp 127.0.0.1:50479->127.0.0.1:7890: use of closed network connection
[INFO]  | WEBSOCKET | 2021/08/24/ 14:07:19 | reconnecting websocket

Note: after some experimenting, the deadlock should be caused by JUST trafficMonitor acquiring Wg
Changing to

				if !w.IsConnecting() && w.IsConnected() {
					w.Wg.Done()
					log.Infof(log.WebsocketMgr,
						"websocket traffic inactive past timeout, shutdown connection")
					err := w.Shutdown()
					if err != nil {
						log.Errorf(log.WebsocketMgr,
							"%v websocket: trafficMonitor shutdown err: %s",
							w.exchangeName, err)
					}
				} else {
					w.Wg.Done()
				}

do get expected behavior at least. When Shutdown() closes the underlying Connection interface, wsReadData gets an empty response and exit loop finally.

commented

Hi @yangrq1018! Have you checked out #754 ? It looks to address many of the hangups that occur when shutting down the websocket. Some exchange implementations had some hanging channels which prevented shutdown from finishing. If you could retest there and if its still occurring, we can look into this further.

edit: Thank you very much for your detailed reporting! I hope the PR fixes things and it should be close to being merged 🤞

@gloriousCode sure I would test the PR so I am gonna close this. If there is more problem I would reopen again.