Thread safety
oanegros opened this issue · comments
Currently the SHTOOLS SHExpandGLQ function i regularly use fails in multithreaded applications. I think this comes from the modulation of fftw3.3.3 plans as only the fftw execute calls are thread-safe: https://www.fftw.org/fftw3_doc/Thread-safety.html
This probably affects (almost) all fftw calls in SHTOOLS.
Would it be an idea to implement the brute fftw_make_planner_thread_safe call and force fft expansions to be thread safe?
I have used the fortran routines with openmp in the past without any problems. Could you give me more info about when this fails and when it doesn't? Or does it fail for even the most simply tasks?
If the problem is indeed FFTW-related, it's most likely not connected with ducc, since ducc doesn't use FFTW internally.
Oh interesting! I had it fail when multiple threads were trying to execute SHExpandGLQ calls on different datasets (with the same dimensions) at the same time. The failure was then a segmentation fault without clear error codes. I found the FFTW thread safety information, and saw that the variables they talk about there were edited in the SHExpandGLQ code.
I tried now changing backend to ducc as @mreineck suggested, but my segmentation faults stay.
The threading implementation i'm using is a python-rewrite of Java threading that is implemented in the bigger software i'm writing in, so this may also have specific vulnerabilities?
I managed to fix it in my code by encapsulating the SHExpandGLQ in an RLock - so it's not very pressing for me, but weird that it happens especially if it's stable with openMP
I don't use openmp very often, so I'll need to investigate...
OK, I have an idea why the Fortran code cannot be called concurrently: in the function you linked there are arrays with the "save" attribute, which means that they are equivalent to static variables in C/C++, i.e. global state. Calling into such a function concurrently will almost certainly lead to undesired effects.
Not sure why you also encounter this with the ducc back-end as well though ... it shouldn't have any global state.
Semi-related question: how do you manage to execute concurrent calls from Python? Doesn't the GIL prevent this?
Sorry, I thake that back: all variables marked save
are also threadprivate
, so this could in principle work.
On the other hand, threadprivate
is an OpenMP-specific keyword, so I have no idea what happens if this is called from multiple non-OpenMP threads...
I had to look it up to figure out how the software im using is doing this: I'm never releasing the GIL, I call SHExpandGLQ
in multiple consecutive threads (that exist for other non-GIL-locked reasons outside of my code) and it seems like the entering into SHExpandGLQ
unlocks the GIL somewhere, causing another of my threads to progress until it collides inside of pyshtools.
Sure! I was just wondering how you can call the SHTOOLS Python interface from two concurrently running Python threads. I most likely have an oversimplified picture in my mind of what you are actually doing.
To get a clearer picture: which function from the SHTOOLS Python interface are you calling exactly?
the only function i have isssues with is SHExpandGLQ
, if i lock this function to wait for thread execution, my code works fine
And i have to say i dont fully understand your question on having multiple concurrent python threads, this is something that does not require the release of the GIL (such as the multiprocessing.ThreadPool implementation)
I'll try to make a minimal working example soon 😄
Thanks, that will certainly be very helpful!
from concurrent import futures
import pyshtools as pysh
import numpy as np
your_patches = np.random.randint(1,1e6, size=(100, 251,503))
zero, w = pysh.expand.SHGLQ(251)
with futures.ThreadPoolExecutor(max_workers=10) as executor:
jobs = [executor.submit(pysh.expand.SHExpandGLQ, patch, w, zero) for patch in your_patches]
[fut.result() for fut in futures.as_completed(jobs)]
This breaks with a segmentation fault for me, but works with max_workers=1
. The 251 lmax is just because i use this as a default grid size (~80*π), but it also breaks with other lmax that i tested
OK, I can reproduce the segmentation fault when the standard shtools backend is used, but not with ducc. Do you have the ducc0
Python package installed? If not, I think the backend will be silently reverted to shtools.
Ah okay, i did not notice because print(pysh.backends.preferred_backend())
did seem to return ducc when this was not available. With the actual ducc backend it does work 😄 . I might switch to this backend for my project.
But I would still say that either the calls should be thread-safe or this behavior needs to be documented for the shtools backend.
OK, might be good to add a function like actual_backend()
, so that the current backend can be identified.
If you switch to the ducc
backend, please make sure to select an appropriate value for nthreads
, otherwise you'll be overcommitting your hardware (since you already are running in parallel on the caller side).
I tried to now implement this in my main project, but there i seem to (currently) be limited to pyshtools v4.9.1 and ducc0 v0.26.0 due to other dependencies. Here the segmentation faults still happen with preferred_backend ducc. Is this not correctly setting the backend, or is the thread safety of ducc a newer thing?
As far as I know, there shouldn't have been any relevant ducc
changes snice 0.26, but we did a lot of tweaking to the backend selection code inside pyshtools
after v4.9.1. I'm not sure that this causes the differences, but I'd suspect it.