[BUG] ASAN issues -- unsynchronized reads

Question

[BUG] ASAN issues -- unsynchronized reads

BenFrantzDale opened this issue 3 months ago · comments

Describe the bug

g++ -fsanitize=thread -Iinclude ./tests/BS_thread_pool_test.cpp -std=c++17 -O3 -Wall -Wextra -Wconversion -Wsign-conversion -Wpedantic -Weffc++ -Wshadow -pthread -o BS_thread_pool_test && TSAN_OPTIONS="halt_on_error=1" ./BS_thread_pool_test

Output:

...
WARNING: ThreadSanitizer: data race (pid=34544)
  Read of size 8 at 0x7ffe64bfb3c8 by thread T73:
    #0 BS::thread_pool::worker(unsigned int, std::function<void ()> const&) <null> (BS_thread_pool_test+0x4ae81) (BuildId: e41652891611ed6c54197009524a2c7d2dd95798)
    #1 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (BS::thread_pool::*)(unsigned int, std::function<void ()> const&), BS::thread_pool*, unsigned int, std::function<void ()> > > >::_M_run() <null> (BS_thread_pool_test+0x44397) (BuildId: e41652891611ed6c54197009524a2c7d2dd95798)
    #2 <null> <null> (libstdc++.so.6+0xecdb3) (BuildId: ca77dae775ec87540acd7218fa990c40d1c94ab1)

  Previous write of size 8 at 0x7ffe64bfb3c8 by thread T68 (mutexes: write M0):
    #0 BS::thread_pool::worker(unsigned int, std::function<void ()> const&) <null> (BS_thread_pool_test+0x4a3d6) (BuildId: e41652891611ed6c54197009524a2c7d2dd95798)
    #1 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (BS::thread_pool::*)(unsigned int, std::function<void ()> const&), BS::thread_pool*, unsigned int, std::function<void ()> > > >::_M_run() <null> (BS_thread_pool_test+0x44397) (BuildId: e41652891611ed6c54197009524a2c7d2dd95798)
    #2 <null> <null> (libstdc++.so.6+0xecdb3) (BuildId: ca77dae775ec87540acd7218fa990c40d1c94ab1)

  Location is stack of main thread.

  Location is global '<null>' at 0x000000000000 ([stack]+0x1e3c8)

  Mutex M0 (0x7ffe64bfb3d0) created at:
    #0 pthread_mutex_lock ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1341 (libtsan.so.2+0x59a13) (BuildId: 38097064631f7912bd33117a9c83d08b42e15571)
    #1 BS::thread_pool::thread_pool(unsigned int, std::function<void ()> const&) <null> (BS_thread_pool_test+0x49650) (B
    ...

In void worker(const concurrency_t idx, const std::function<void()>& init_task), there's a data race:

        std::unique_lock tasks_lock(tasks_mutex);
        while (true)
        {
            --tasks_running;
            tasks_lock.unlock();
 // Right here, we read waiting, tasks_running, and worst of all, 
 // BS_THREAD_POOL_PAUSED_OR_EMPTY expands to code that includes tasks.empty().
            if (waiting && (tasks_running == 0) && BS_THREAD_POOL_PAUSED_OR_EMPTY)
                tasks_done_cv.notify_all();
            tasks_lock.lock();
            task_available_cv.wait(tasks_lock,
                [this]
                {
                    return !BS_THREAD_POOL_PAUSED_OR_EMPTY || !workers_running;
                });
            if (!workers_running)
                break;
            {
#ifdef BS_THREAD_POOL_ENABLE_PRIORITY
                const std::function<void()> task = std::move(std::remove_const_t<pr_task&>(tasks.top()).task);
                tasks.pop();
#else
                const std::function<void()> task = std::move(tasks.front());
                tasks.pop();
#endif
                ++tasks_running;
                tasks_lock.unlock();
                task();
            }
            tasks_lock.lock();
        }

Minimal working example
Build with address sanitizer, and it trips.

Behavior

Address sanitizer should be happy.

System information

x86
Linux

Additional information

Reads of non-atomic data must be mutex-protected in C++. There's a chance that the undefined behavior that produces will "work", but that's still dicey. Worse is tasks.empty() which potentially is reading two pointers and comparing them. Even if they were atomic reads, there's a data race between the reads.

Solution: Just don't unlock the mutex:

std::unique_lock tasks_lock(tasks_mutex);
while (true)
{
    --tasks_running;
    if (waiting && (tasks_running == 0) && BS_THREAD_POOL_PAUSED_OR_EMPTY)
    {
        tasks_done_cv.notify_all();
    }

That will wake up threads just to have them hit the mutex, but most likely what we had will too (notifying then immediately locking).

Barak Shoshany · Answer 1 · Thu Dec 12 2024 00:40:05 GMT+0800 (China Standard Time)

Hi @BenFrantzDale and thanks for opening this issue! This issue is in fact already fixed in v5.0.0, which will be released in the next few days. In this version I made many changes, added new features, and fixed some bugs, including this one. The code is already finished, I'm just updating the documentation, and will release the new version as soon as I'm done.

Ben FrantzDale · Answer 2 · Thu Dec 12 2024 01:48:25 GMT+0800 (China Standard Time)

@bshoshany that's great. Is that branch pushed? The issue seems to be in the master branch right now...

Barak Shoshany · Answer 3 · Thu Dec 12 2024 03:59:48 GMT+0800 (China Standard Time)

I usually develop locally and only upload the finished product to GitHub. It will be up in a few days :)