tarantool / tarantool-qa

QA related issues of Tarantool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flaky test box/net.box_reconnect_after_gh-3164.test.lua

NickVolynkin opened this issue · comments

I made some investigation of the issue. It is quite easily reproduced on my notebook.

Test expects the connection object (result of net.box.connect) to be garbage collected when test drops strong reference to it. But from time to time it is not.

I dumped lua objects graph on test failure after garbage collection using this script (http://code.matthewwild.co.uk/luatraverse/file/tip/dump.lua). There is no references from root set that can block garbage collection. Dump is not complete is a sense it does not show references from local variables. Also I have to patch the script to ignore non standard Lua types. So we need to inspect code to to see if there can be other references.

net_box.lua has some cases when root set for a short period of time references connection object. First is a transport callback. In weak_callback we take strong reference to callback (and thus connection). Second is a watcher for a 'box.shutdown' event. Watcher callback is executed in a per-event fiber.

Yet in a test we can't have these fibers (transport or watcher) being executed during garbage collection. Watcher is executed only on 'box.shutdown'. Transport callback is executed on reconnection but it does not yield and thus can not be executed during gc.

If jit is off I can not reproduce the issue. (With jit on it is reproduced in several tens runs. With jit is off it is not reproduced after about thousand runs.)

Also issue remains even if gc is run for 5 second in a loop (with or without delays).

The issue is a dup of tarantool/tarantool#5081.