openzfsonwindows / ZFSin

OpenZFS on Windows port

Home Page:https://openzfsonwindows.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cv_signal(&arc_abd_move_thr_cv) being called after cv_destroy(&arc_abd_move_thr_cv);

arun-kv opened this issue · comments

In arc_fini we signal the arc_abd_move_thread to exit first, and then destroy arc_abd_move_thr_cv.
https://github.com/openzfsonwindows/ZFSin/blob/master/ZFSin/zfs/module/zfs/arc.c#L7837
and then we signal the arc_reclaim_thread, which further tries to signal the arc_abd_move_thr_cv which is already destroyed.
https://github.com/openzfsonwindows/ZFSin/blob/master/ZFSin/zfs/module/zfs/arc.c#L5183
This leads to occasional panic during uninstallation.

00 ffffd901`0d02f978 fffff800`ecf75f07 nt!KeBugCheckEx
01 ffffd901`0d02f980 fffff800`ecf54dcf nt!PspSystemThreadStartup$filt$0+0x44
02 ffffd901`0d02f9c0 fffff800`ecf6bb8d nt!_C_specific_handler+0x9f
03 ffffd901`0d02fa30 fffff800`ecebde91 nt!RtlpExecuteHandlerForException+0xd
04 ffffd901`0d02fa60 fffff800`ecebcc07 nt!RtlDispatchException+0x421
05 ffffd901`0d030160 fffff800`ecf70a0e nt!KiDispatchException+0x1d7
06 ffffd901`0d030820 fffff800`ecf6e073 nt!KiExceptionDispatch+0xce
07 ffffd901`0d030a00 fffff807`d842222e nt!KiBreakpointTrap+0xf3
08 ffffd901`0d030b90 fffff807`d847f082 ZFSin!spl_cv_signal+0xe [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\spl\module\spl\spl-condvar.c @ 72] 
09 ffffd901`0d030bc0 fffff800`ececb595 ZFSin!arc_reclaim_thread+0x422 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\arc.c @ 5185] 
0a ffffd901`0d030c10 fffff800`ecf6ac56 nt!PspSystemThreadStartup+0x41
0b ffffd901`0d030c60 00000000`00000000 nt!KiStartSystemThread+0x16

@lundman We may have to consider splitting arc_abd_move_thr_fini()
https://github.com/openzfsonwindows/ZFSin/blob/master/ZFSin/zfs/module/zfs/arc.c#L9412

into two, where we do

        mutex_enter(&arc_abd_move_thr_lock);
	cv_signal(&arc_abd_move_thr_cv);
	arc_abd_move_thr_exit = 1;
	while (arc_abd_move_thr_exit != 0)
		cv_wait(&arc_abd_move_thr_cv, &arc_abd_move_thr_lock);
	mutex_exit(&arc_abd_move_thr_lock);

in the first and

        mutex_destroy(&arc_abd_move_thr_lock);
	cv_destroy(&arc_abd_move_thr_cv);

in the second.

The second part should be delayed till the "end" of driver unload. This allows other threads to inspect arc_abd_move_thr_exit and lock/unlock the synchronization primitive protecting it.

To be honest, the abd_move work came from osx, where I have already removed it - when merging with the new port. It was decided that if the need comes up again, we'll re-implement it, since the way abd is setup is a little different. At this point, I'm inclined to remove from ZFSin as well.

Nice catch though!

Thanks @lundman. We will take this change in our environment and see how it goes.