tox-dev / filelock

A platform-independent file lock for Python.

Home Page:https://py-filelock.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

filelock 3.11 breaks our multi-threaded parallel processing use case

slanzmich opened this issue · comments

Hi,

In #219 @csm10495 wondered

I'm curious though: Is there ever a case where this would break someone? I guess if someone expected the object to be the same across threads.. but then why would they be using the lock to begin with?

I guess we have a use case that breaks:

We have a multi-threaded processing pipeline that is set up roughly like this:

  1. Initialize and create a SoftFileLock for the output file.
  2. Prepare some data for step 3.
  3. Generate the output file with the data from step 2.
  4. Clean up and release the lock acquired in 1.

These steps are executed for multiple inputs, each generating a different output file. Step 2 for one input can run in parallel to step 3 for another input. However, step 3 for one input must not run in parallel to step 3 for another input, as it uses resources that shall not be shared.

The SoftFileLock is required because there may be another process that is executed on an overlapping set of input files (thus generating an overlapping set of output files). With filelock 3.10, we could pass the lock from the thread executing step 1 to the thread executing step 4, but this broke with #219.

For this use case, I don't see anything wrong with sharing the lock between threads, because the lock is intended to prevent concurrent access between different processes. What do you think?

I'm not sure what would be a workaround here, clearly 99% of case makes sense to have it thread local, but how could we allow usage for this 1% 🤔 cc @csm10495

Any chance we can have a small code sample to play with?

Is it one lock per thread or are the same locks ever used across threads?

Or does the initial process just generate locks for threads to eventually use?

I tried via this sample (and the latest released filelock):

import filelock
import tempfile
import pathlib
import threading

LOCK = filelock.SoftFileLock(tempfile.NamedTemporaryFile().name)
TEST_FILE = pathlib.Path(tempfile.NamedTemporaryFile().name)

def in_thread():
    with LOCK:
        # the lol attribute is getting set inside the thread (so with the thread-local copy)
        # ... so since that happens after this check, we never print HAS LOL
        if hasattr(LOCK, 'lol'):
            print("HAS LOL")
        LOCK.lol = 1
        TEST_FILE.write_text(threading.current_thread().name)

if __name__ == '__main__':
    threads = []
    for i in range(10):
        threads.append(threading.Thread(target=in_thread))
        threads[-1].start()

    for t in threads:
        t.join()

    print(TEST_FILE.read_text())

It seems to work properly

@slanzmich can you confirm please

Same issue there - we use the lock to prevent concurrent access between batch processes (not threads). Our code is asynchronous (asyncio) and we lock and unlock through asynchronous calls to asyncio.to_thread (which allocates different threads depending on the state of its internal asyncio thread pool), these threads calling filelock in turn in a synchronous fashion. This used to work very well until now. Following #219 this makes the latest version of filelock unusable in our architecture.

More generally, using asyncio.to_thread to wrap a synchronous function (like filelock methods) into asynchronous code is a pattern becoming more and more common as asynchronous code is becoming more popular in python. This use case prevents any synchronous code to assume that it is running in a particular thread (it has to be thread-agnostic).

@dmoklaf please can you provide a minimal code sample to reproduce? I tried via:

import filelock
import tempfile
import pathlib
import asyncio
import uuid


def lock_write_verify(lock: filelock.FileLock, file_path: pathlib.Path):
    data = str(uuid.uuid4())
    with lock:
        file_path.write_text(data)
        assert file_path.read_text() == data

async def main():
    file_path = pathlib.Path(tempfile.mktemp())
    lock_path = pathlib.Path(tempfile.mktemp())

    lock = filelock.FileLock(lock_path)

    await asyncio.gather(
        asyncio.to_thread(lock_write_verify, lock, file_path),
        asyncio.to_thread(lock_write_verify, lock, file_path),
        asyncio.to_thread(lock_write_verify, lock, file_path),
        asyncio.to_thread(lock_write_verify, lock, file_path),
        asyncio.to_thread(lock_write_verify, lock, file_path),
    )

if __name__ == '__main__':
    asyncio.run(main())

But it works fine for me.

Maybe its something like this:

import filelock
import tempfile
import pathlib
import asyncio
import uuid


def write_verify(file_path: pathlib.Path):
    data = str(uuid.uuid4())
    file_path.write_text(data)
    assert file_path.read_text() == data

async def main():
    file_path = pathlib.Path(tempfile.mktemp())
    lock_path = pathlib.Path(tempfile.mktemp())

    lock = filelock.FileLock(lock_path)

    await asyncio.gather(
        asyncio.to_thread(lock.acquire),
        asyncio.to_thread(write_verify, file_path),
        asyncio.to_thread(lock.release),
        asyncio.to_thread(lock.acquire),
        asyncio.to_thread(write_verify, file_path),
        asyncio.to_thread(lock.release),
    )

if __name__ == '__main__':
    for i in range(100000):
        try:
            asyncio.run(main())
        except AssertionError:
            continue
        else:
            print("PASS")
            break

The assertion fails very often on both 3.10.7 and 3.11.0 .. though often on 3.11.0 it hangs.

The possible hang in 3.11.0 is because the order of the gather'd tasks isn't guaranteed by asyncio. So it can wind up doing an acquire while already locked leading to deadlock.. The same hang doesn't happen in 3.10.7 because the lock object is re-entrant and allows you to acquire more than once without calling release.

The assertion in either case shows that this type of usage isn't really safe (in either case) due to the ordering of the threads running.

I feel like this seems like a bit of a stretch though so please if either of you can provide a similarly length failing case, i'd be happy evaluate further.

Here is some sample code that reproduces the issue. The symptom shall be that the programs freezes at some point - this requires to use filelock 3.11.0 (works perfectly with 3.10.7) AND to have some multi-threading going on (hence why I run parallel tasks below and pauses to force the asyncio runtime to allocate multiple threads in its internal pool).

import asyncio
import random

import filelock

async def acquire(l):
	await asyncio.to_thread(l.acquire)

async def pause():
	delay = random.uniform(0.5, 1.5)
	await asyncio.sleep(delay)

async def release(l):
	await asyncio.to_thread(l.release)

async def process_segment(segment):
	print(f"Processing segment {segment}...")
	l = filelock.FileLock(f"segment_{segment}.lock")
	await acquire(l)
	try:
		await pause()
	finally:
		await release(l)
	print(f"Processed segment {segment}")

async def process_segments():
	tasks = []
	for segment in range(40):
		task = asyncio.create_task(process_segment(segment))
		tasks.append(task)
	await asyncio.gather(*tasks)

async def process_all():
	for _ in range(20):
		await process_segments()

asyncio.run(process_all())

Thanks for the example. Why is the lock needed if each process_segment task is running sequentially?

For example I pulled out the calls to acquire/release (removing the need for the lock) and tried:

import asyncio
import uuid
import random
import pathlib

import filelock

async def acquire(l):
    return
    await asyncio.to_thread(l.acquire)

async def mess_with(segment):
    r = str(uuid.uuid4())
    f = pathlib.Path(f"segment_{segment}.txt")
    f.write_text(r)
    assert f.read_text() == r
    f.unlink()


async def release(l):
    return
    await asyncio.to_thread(l.release)

async def process_segment(segment):
    print(f"Processing segment {segment}...")
    l = filelock.FileLock(f"segment_{segment}.lock")
    await acquire(l)
    try:
        await mess_with(segment)
    finally:
        await release(l)
    print(f"Processed segment {segment}")

async def process_segments():
    tasks = []
    for segment in range(40):
        task = asyncio.create_task(process_segment(segment))
        tasks.append(task)
    await asyncio.gather(*tasks)

async def process_all():
    for _ in range(20):
        await process_segments()

asyncio.run(process_all())

and it seemed to work because mess_with always runs at one point in time per segment.

Here is a similar version that has a chance for overlapping a segment:

import asyncio
import uuid
import random
import pathlib

import filelock

async def acquire(l):
    await asyncio.to_thread(l.acquire)

def mess_with_thread(segment):
    r = str(uuid.uuid4())
    f = pathlib.Path(f"segment_{segment}.txt")
    f.write_text(r)
    assert f.read_text() == r
    f.unlink()

async def release(l):
    await asyncio.to_thread(l.release)

async def process_segment(segment, l):
    print(f"Processing segment {segment}...")
    await acquire(l)
    try:
        await asyncio.to_thread(mess_with_thread, segment)
    finally:
        await release(l)
    print(f"Processed segment {segment}")

async def process_segments():
    tasks = []
    segment_to_lock = {}
    for _ in range(2):
        for segment in range(40):
            if segment not in segment_to_lock:
                segment_to_lock[segment] = filelock.FileLock(f"segment_{segment}.lock")
            task = asyncio.create_task(process_segment(segment, segment_to_lock[segment]))
            tasks.append(task)
    await asyncio.gather(*tasks)

async def process_all():
    for _ in range(20):
        await process_segments()

asyncio.run(process_all())

This version on 3.10.7 is likely to hit an assertion error because of the re-entrant nature of the lock, which allows multiple threads to work with the segment at the same time. (On 3.11.0 it'll probably hang)

If you don't have overlapping segments (via removing the for _ in range(2)) line it works... but then you don't need the lock to begin with since there is no overlap and it goes back to the given example and the question as to why the lock is needed.

Thanks for the example. Why is the lock needed if each process_segment task is running sequentially?

This is because we use filelock to prevent separate processes from stepping on each other toes while processing data segments - this is our only interest for filelock.

This example is an extreme simplification of our python processes, that I extracted only to reproduce the "freeze" bug as requested - it does not show the multi-processing part of course. But running several times in parallel this process through the bash shell could simulate the multi-processing part easily.

It works perfectly with filelock < 3.11. And it fails every time with 3.11.

I tried this with 3.11.0:

import asyncio
import random
import os
from concurrent.futures import ProcessPoolExecutor
import filelock

def print_with_pid(*args):
    print(f"[{os.getpid()}]", *args)

async def acquire(l):
	await asyncio.to_thread(l.acquire)

async def pause():
	delay = random.uniform(0.5, 1.5)
	await asyncio.sleep(delay)

async def release(l):
	await asyncio.to_thread(l.release)

async def process_segment(segment):
	print_with_pid(f"Processing segment {segment}...")
	l = filelock.FileLock(f"segment_{segment}.lock")
	await acquire(l)
	try:
		await pause()
	finally:
		await release(l)
	print_with_pid(f"Processed segment {segment}")

async def process_segments():
	tasks = []
	for segment in range(40):
		task = asyncio.create_task(process_segment(segment))
		tasks.append(task)
	await asyncio.gather(*tasks)

async def process_all():
	for _ in range(20):
		await process_segments()

def run_via_asyncio():
    asyncio.run(process_all())

if __name__ == '__main__':

    with ProcessPoolExecutor(32) as executor:
        for i in range(32):
            executor.submit(run_via_asyncio)

        executor.shutdown(wait=True)

This does both multi process and asyncio, and didn't get a hang. What OS are you on? I'm on Windows.

On a Linux docker i can see the hang, but i think its expected and that the previous behavior may have been misleading.

Consider this sample:

from filelock import FileLock
import threading

f = FileLock('test.lock')
threading.Thread(target=f.acquire).start()
threading.Thread(target=f.acquire).start()
threading.Thread(target=f.acquire).start()
threading.Thread(target=f.acquire).start()
threading.Thread(target=f.acquire).start()
print(f._lock_counter)
print(f.is_locked)
f.release()
print(f._lock_counter)
print(f.is_locked)

On Linux with 3.10.7 this prints:

root@16464b8b0f6e:/# python test.py
5
True
4
True

On 3.11 this hangs.. which makes sense to me since the lock was not released (so the other threads can't acquire it).

The lock is re-entrant which means that each time acquire is called in the same process it just adds to the lock count and continues. If you're locking ranges in the same process using the same object for the purpose of modifying the underlying resource: it isn't really locking it (cross thread) with 3.10.7.

The new behavior actually does lock cross thread which I believe was the more natural expectation here.

@gaborbernat what are you're thoughts here? In theory we could use a metaclass or something to allow for the old behavior along with the new behavior, but I'm kind of concerned that issues hit from the swap from 3.10 to 3.11 may be pointing to other bugs or potential issues elsewhere in folks's stacks.

My impression is that this thread's local behavior perhaps should be on by default, however one should be able to disable it.

Fair enough. I'll look into a way to make that possible.

I tried this with 3.11.0:

import asyncio
import random
import os
from concurrent.futures import ProcessPoolExecutor
import filelock

def print_with_pid(*args):
    print(f"[{os.getpid()}]", *args)

async def acquire(l):
	await asyncio.to_thread(l.acquire)

async def pause():
	delay = random.uniform(0.5, 1.5)
	await asyncio.sleep(delay)

async def release(l):
	await asyncio.to_thread(l.release)

async def process_segment(segment):
	print_with_pid(f"Processing segment {segment}...")
	l = filelock.FileLock(f"segment_{segment}.lock")
	await acquire(l)
	try:
		await pause()
	finally:
		await release(l)
	print_with_pid(f"Processed segment {segment}")

async def process_segments():
	tasks = []
	for segment in range(40):
		task = asyncio.create_task(process_segment(segment))
		tasks.append(task)
	await asyncio.gather(*tasks)

async def process_all():
	for _ in range(20):
		await process_segments()

def run_via_asyncio():
    asyncio.run(process_all())

if __name__ == '__main__':

    with ProcessPoolExecutor(32) as executor:
        for i in range(32):
            executor.submit(run_via_asyncio)

        executor.shutdown(wait=True)

This does both multi process and asyncio, and didn't get a hang. What OS are you on? I'm on Windows.

I am on MacOS 13.3.1 with Python 3.10.10

I tested your code and, indeed, it works flawlessly. However my code still freezes. The difference between both our codes is only that you multiprocess while my code doesn't (it runs as a single process).

I have just checked our logs and indeed, the freezes occur at times where a single process is running (which is completely random and not something we can actively control, it depends on the duration of jobs, length of the data batches, ...) - not when multiple processes are running concurrently!

So it seems there is something within filelock's 3.11 version that freezes single processes, and this bug is hidden/does not happen when multiple processes are operating concurrently.

My impression is that this thread's local behavior perhaps should be on by default, however one should be able to disable it.

Using multiple threads (and not the same one) when wrapping synchronous API with asyncio is (unfortunately) a common practice. So, if you make that choice, more and more developers will need to activate this setting, as asyncio's popularity is growing fast - it will be productive if this setting is announced front and center in the project's homepage. Otherwise they will bug this issue list endlessly.

Second, why is some thread-local code absolutely necessary? The unix code seems simple, is that related to the Windows code? Or something else?

The original code is reentrant across threads, effectively making it not work as expected across threads. I reported for Windows, others reported on Mac.

See the conversation in #82.

Do you expect that multiple threads can acquire and hold the lock at the same time?

Generally I don't think that's intuitive, since even if using asyncio, you can have cases where multiple writers are holding the lock (and in theory could be writing). If they could never be writing at the same time, the lock wouldn't be needed.

Generally I don't think that's intuitive, since even if using asyncio, you can have cases where multiple writers are holding the lock (and in theory could be writing). If they could never be writing at the same time, the lock wouldn't be needed.

Calling a blocking function from separate, unrelated threads is classic with asyncio (that's how most synchronous libraries are embedded in asynchronous code). So forbidding this case, or making it non-trivial through configuration arguments, may lead more and more developers to declare GitHub issues here as asyncio is gaining in popularity.

The solution I propose is simple - and I believe the library as it was before 3.11 was fine for both as long the developper was doing the Right Thing (TM). Here it is - I see 2 cases for reentrance/concurrency against filelock:

- With multi-tasks code (asyncio lib)
Our current code is a good illustration of what the developper has to do. We never call filelock directly. We always guard calls to a filelock object with an asyncio.Lock object to guarantee that at most one asynchronous task accesses the filelock. Effectively, our code is equivalent to this (this is not visible in my reproduction code above which focused only on the bug and not on this discussion):

# This is an asyncio.Lock that guarantees that at most one asynchronous tasks manipulates the lock
async with my_asyncio_lock:
    await asyncio.to_thread(my_filelock_lock.acquire)
    try:
         # Do the processing
    finally:
        await asyncio.to_thread(my_filelock_lock.release)

- With multi-threaded code (threading lib)
Same principle as for the multi-tasks code above, just replacing the wrapping with asyncio.Lock objects with threading.Lock objects. Effectively, this is equivalent to this:

# This is a threading.Lock that guarantees that at most one thread manipulates the lock
with my_threading_lock:
    with my_filelock_lock:
        # Do the processing

So, as long as it is written more clearly in the documentation that this library only ensures inter-process locking and that it has undefined behaviour if several tasks or several threads access at the same time the lock, #82 does not seem like a bug - but an undefined behaviour of the library while called in a forbidden way. The developper has to ensure inter-tasks locking or inter-threads locking himself, and everything as it was before 3.11 will work perfectly.

Alternatively, I don't see how the filelock library could handle this complexity itself and hide it from the developper, as it would have to handle both the multi-tasks case and the multi-threaded case. These are very different cases with different technological stacks. The reason being that Python, now, proposes 2 very different ways to execute code in parallel.

Interesting. @gaborbernat (as author) what was the library's initial intent here? Was it just inter-process locking or also inter-thread locking?

I don't think there was an eplixicit intent here 🤔 so I'd go for a solution that supports both, though perhaps configurable 🤔

Opened a PR that should give us both behaviors. Hopefully it gives us all worlds.

Sorry for not following up earlier, I was out for a couple of days.

@csm10495 / @gaborbernat Does it still make sense to provide a code example? I only saw the issue on Windows, but I haven't checked whether it also fails on other platforms.

Thank you for taking quick action to provide an alternative to others.

We were able to use it effectively to get workable code with a multi threaded workload.

I'm personally not too sure that the new behavior, thread local by default, is appropriate.

I feel like the multi threaded worjload described above is quite a common path for those utilizing locks.

I'm not asking for any immediate action, but more i want to raise my voice as one user that would support making the thread local behavior opt in as opposed to opt out.