SpartanJ / efsw

efsw is a C++ cross-platform file system watcher and notifier.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crash on tear down - no clear way to gracefully join on thread

JasonDictos opened this issue · comments

I am using this in a product and we noticed during heavy testing where we dynamically add/remove watchers we would encounter heap corruption/crashes. Looking over the code removing a watch does not really tie the flow to the thread, we've since patched efsw against the revision we are working with to allow for the deconstruction of the FileWatcherWin32 object to properly call wait on the thread after setting the mInitOK to false and notifying the thread over the IOCP queue:

diff --git a/src/efsw/FileWatcherWin32.cpp b/src/efsw/FileWatcherWin32.cpp
index 8746a69..92966ef 100644
--- a/src/efsw/FileWatcherWin32.cpp
+++ b/src/efsw/FileWatcherWin32.cpp
@@ -23,13 +23,16 @@ FileWatcherWin32::~FileWatcherWin32()
{
       mInitOK = false;

-       removeAllWatches();
-
       if (mIOCP && mIOCP != INVALID_HANDLE_VALUE)
       {
               PostQueuedCompletionStatus(mIOCP, 0, reinterpret_cast<ULONG_PTR>(this), NULL);
       }

+       if (mThread)
+               mThread->wait();
+
+       removeAllWatches();
+
       CloseHandle(mIOCP);

       efSAFE_DELETE( mThread );

I believe similar logic in the other watchers is warranted as well:

diff --git a/src/efsw/FileWatcherFSEvents.cpp b/src/efsw/FileWatcherFSEvents.cpp
index 5aac142..fa47882 100644
--- a/src/efsw/FileWatcherFSEvents.cpp
+++ b/src/efsw/FileWatcherFSEvents.cpp
@@ -86,6 +86,9 @@ FileWatcherFSEvents::~FileWatcherFSEvents()
 {
 	mInitOK = false;
 
+	if (mThread)
+		mThread->wait();
+
 	efSAFE_DELETE( mThread );
 
 	WatchMap::iterator iter = mWatches.begin();
diff --git a/src/efsw/FileWatcherGeneric.cpp b/src/efsw/FileWatcherGeneric.cpp
index fd423b1..276924d 100644
--- a/src/efsw/FileWatcherGeneric.cpp
+++ b/src/efsw/FileWatcherGeneric.cpp
@@ -19,6 +19,9 @@ FileWatcherGeneric::~FileWatcherGeneric()
 {
 	mInitOK = false;
 
+	if (mThread)
+		mThread->wait();
+
 	efSAFE_DELETE( mThread );
 
 	/// Delete the watches
diff --git a/src/efsw/FileWatcherInotify.cpp b/src/efsw/FileWatcherInotify.cpp
index d1652f8..da1a772 100644
--- a/src/efsw/FileWatcherInotify.cpp
+++ b/src/efsw/FileWatcherInotify.cpp
@@ -47,6 +47,9 @@ FileWatcherInotify::~FileWatcherInotify()
 {
 	mInitOK = false;
 
+	if (mThread)
+		mThread->wait();
+
 	efSAFE_DELETE( mThread );
 	
 	WatchMap::iterator iter = mWatches.begin();
diff --git a/src/efsw/FileWatcherWin32.cpp b/src/efsw/FileWatcherWin32.cpp
index 8746a69..92966ef 100644
--- a/src/efsw/FileWatcherWin32.cpp
+++ b/src/efsw/FileWatcherWin32.cpp
@@ -23,13 +23,16 @@ FileWatcherWin32::~FileWatcherWin32()
 {
 	mInitOK = false;
 
-	removeAllWatches();
-
 	if (mIOCP && mIOCP != INVALID_HANDLE_VALUE)
 	{
 		PostQueuedCompletionStatus(mIOCP, 0, reinterpret_cast<ULONG_PTR>(this), NULL);
 	}
 
+	if (mThread)
+		mThread->wait();
+
+	removeAllWatches();
+
 	CloseHandle(mIOCP);
 
 	efSAFE_DELETE( mThread );

Hi! Thanks for your contribution!
I'll need to check for each case individually because the destructor of the Thread class already calls to wait(), so I don't think this is an issue on all platforms. I think this is more related to the order of execution in the specific case of the Win32 watcher.
I would like to know a way to reproduce this issue so I can test it here. Anyway, maybe you already have that test, so I'll ask you if you can simply try to change the execution order to something like:

FileWatcherWin32::~FileWatcherWin32()
{
	mInitOK = false;

	if (mIOCP && mIOCP != INVALID_HANDLE_VALUE)
	{
		PostQueuedCompletionStatus(mIOCP, 0, reinterpret_cast<ULONG_PTR>(this), NULL);
	}

	efSAFE_DELETE( mThread );

	removeAllWatches();

	CloseHandle(mIOCP);
}

If that doesn't work I'll apply your patch for Win32.
Thanks again!

I see, so removeAllWatchers is the one that caused the memory problem, and wait is built into thread destructor, ok I get it.

Certainly a problem on windows, well mostly because we simply could not use the 'removeWatcher' by id apis, to properly guarantee the callback thread wouldn't throw right after that. So we reverted to destructing, that way we could implicitly rely on the internal thread wait to occur.

I'd think breaking out the destructor behavior into a shutdown api perhaps would be quite useful, although its not too bad wrapping this in an optional as it is.

Thanks for the feedback, if I have time I'll contribute to the unit tests in this project to lock down the behavior.

This should be fixed. Closing.