Thread safety

Question

Thread safety

oanegros opened this issue 2 years ago · comments

Currently the SHTOOLS SHExpandGLQ function i regularly use fails in multithreaded applications. I think this comes from the modulation of fftw3.3.3 plans as only the fftw execute calls are thread-safe: https://www.fftw.org/fftw3_doc/Thread-safety.html
This probably affects (almost) all fftw calls in SHTOOLS.

Would it be an idea to implement the brute fftw_make_planner_thread_safe call and force fft expansions to be thread safe?

markwieczorek · Answer 1 · Tue May 02 2023 17:04:35 GMT+0800 (China Standard Time)

I have used the fortran routines with openmp in the past without any problems. Could you give me more info about when this fails and when it doesn't? Or does it fail for even the most simply tasks?

mreineck · Answer 2 · Tue May 02 2023 17:36:12 GMT+0800 (China Standard Time)

If the problem is indeed FFTW-related, it's most likely not connected with ducc, since ducc doesn't use FFTW internally.

Oane Gros · Answer 3 · Tue May 02 2023 17:56:14 GMT+0800 (China Standard Time)

Oh interesting! I had it fail when multiple threads were trying to execute SHExpandGLQ calls on different datasets (with the same dimensions) at the same time. The failure was then a segmentation fault without clear error codes. I found the FFTW thread safety information, and saw that the variables they talk about there were edited in the SHExpandGLQ code.
I tried now changing backend to ducc as @mreineck suggested, but my segmentation faults stay.

The threading implementation i'm using is a python-rewrite of Java threading that is implemented in the bigger software i'm writing in, so this may also have specific vulnerabilities?

I managed to fix it in my code by encapsulating the SHExpandGLQ in an RLock - so it's not very pressing for me, but weird that it happens especially if it's stable with openMP

markwieczorek · Answer 4 · Tue May 02 2023 18:06:51 GMT+0800 (China Standard Time)

I don't use openmp very often, so I'll need to investigate...

mreineck · Answer 5 · Tue May 02 2023 18:49:03 GMT+0800 (China Standard Time)

OK, I have an idea why the Fortran code cannot be called concurrently: in the function you linked there are arrays with the "save" attribute, which means that they are equivalent to static variables in C/C++, i.e. global state. Calling into such a function concurrently will almost certainly lead to undesired effects.

Not sure why you also encounter this with the ducc back-end as well though ... it shouldn't have any global state.

Semi-related question: how do you manage to execute concurrent calls from Python? Doesn't the GIL prevent this?

mreineck · Answer 6 · Tue May 02 2023 19:02:45 GMT+0800 (China Standard Time)

Sorry, I thake that back: all variables marked save are also threadprivate, so this could in principle work.

On the other hand, threadprivate is an OpenMP-specific keyword, so I have no idea what happens if this is called from multiple non-OpenMP threads...

Oane Gros · Answer 7 · Tue May 02 2023 20:00:14 GMT+0800 (China Standard Time)

I had to look it up to figure out how the software im using is doing this: I'm never releasing the GIL, I call SHExpandGLQ in multiple consecutive threads (that exist for other non-GIL-locked reasons outside of my code) and it seems like the entering into SHExpandGLQ unlocks the GIL somewhere, causing another of my threads to progress until it collides inside of pyshtools.

Oane Gros · Answer 8 · Tue May 02 2023 20:01:36 GMT+0800 (China Standard Time)

The GIL unlocking was proposed in #304 and is done here it seems

mreineck · Answer 9 · Tue May 02 2023 20:04:38 GMT+0800 (China Standard Time)

Sure! I was just wondering how you can call the SHTOOLS Python interface from two concurrently running Python threads. I most likely have an oversimplified picture in my mind of what you are actually doing.

mreineck · Answer 10 · Tue May 02 2023 20:08:05 GMT+0800 (China Standard Time)

To get a clearer picture: which function from the SHTOOLS Python interface are you calling exactly?

Oane Gros · Answer 11 · Tue May 02 2023 20:10:31 GMT+0800 (China Standard Time)

the only function i have isssues with is SHExpandGLQ, if i lock this function to wait for thread execution, my code works fine

Oane Gros · Answer 12 · Tue May 02 2023 20:11:49 GMT+0800 (China Standard Time)

And i have to say i dont fully understand your question on having multiple concurrent python threads, this is something that does not require the release of the GIL (such as the multiprocessing.ThreadPool implementation)

Oane Gros · Answer 13 · Tue May 02 2023 20:15:56 GMT+0800 (China Standard Time)

I'll try to make a minimal working example soon 😄

mreineck · Answer 14 · Tue May 02 2023 20:17:11 GMT+0800 (China Standard Time)

Thanks, that will certainly be very helpful!

Oane Gros · Answer 15 · Tue May 02 2023 22:24:52 GMT+0800 (China Standard Time)

from concurrent import futures
import pyshtools as pysh
import numpy as np

your_patches = np.random.randint(1,1e6, size=(100, 251,503))

zero, w = pysh.expand.SHGLQ(251)
with futures.ThreadPoolExecutor(max_workers=10) as executor:
    jobs = [executor.submit(pysh.expand.SHExpandGLQ, patch, w, zero) for patch in your_patches]
    [fut.result() for fut in futures.as_completed(jobs)]

This breaks with a segmentation fault for me, but works with max_workers=1 . The 251 lmax is just because i use this as a default grid size (~80*π), but it also breaks with other lmax that i tested

mreineck · Answer 16 · Tue May 02 2023 23:19:13 GMT+0800 (China Standard Time)

OK, I can reproduce the segmentation fault when the standard shtools backend is used, but not with ducc. Do you have the ducc0 Python package installed? If not, I think the backend will be silently reverted to shtools.

Oane Gros · Answer 17 · Wed May 03 2023 00:08:55 GMT+0800 (China Standard Time)

Ah okay, i did not notice because print(pysh.backends.preferred_backend()) did seem to return ducc when this was not available. With the actual ducc backend it does work 😄 . I might switch to this backend for my project.
But I would still say that either the calls should be thread-safe or this behavior needs to be documented for the shtools backend.

mreineck · Answer 18 · Wed May 03 2023 00:14:36 GMT+0800 (China Standard Time)

OK, might be good to add a function like actual_backend(), so that the current backend can be identified.
If you switch to the ducc backend, please make sure to select an appropriate value for nthreads, otherwise you'll be overcommitting your hardware (since you already are running in parallel on the caller side).

Oane Gros · Answer 19 · Wed May 03 2023 15:45:01 GMT+0800 (China Standard Time)

I tried to now implement this in my main project, but there i seem to (currently) be limited to pyshtools v4.9.1 and ducc0 v0.26.0 due to other dependencies. Here the segmentation faults still happen with preferred_backend ducc. Is this not correctly setting the backend, or is the thread safety of ducc a newer thing?

mreineck · Answer 20 · Wed May 03 2023 15:57:33 GMT+0800 (China Standard Time)

As far as I know, there shouldn't have been any relevant ducc changes snice 0.26, but we did a lot of tweaking to the backend selection code inside pyshtools after v4.9.1. I'm not sure that this causes the differences, but I'd suspect it.