Are callbacks thread-safe in Python land?

Question

Are callbacks thread-safe in Python land?

vkz opened this issue 3 months ago · comments

Hi. What happens if CL exported callback gets called from a separate worker thread in Python? Suppose, I were to use threading and start a thread that given some external event would call a CL function that we'd exported earlier.

Way I'm reading LispCallbackObject it would merrily write to sys.stdout to trigger execution on CL side. Python documentation says TextIOWrapper is not thread-safe and that's what I believe sys.stdout is. Could we get unlucky and possibly clobber stdout with two different writers - one from the main thread's message loop handling a command and our callback? This maybe mitigated by GIL, but I don't know enough Python internals to make that call, tbh.

I also found the bit about python-call-async in the readme confusing:

If the function call requires callbacks to lisp, then these will only be serviced when a py4cl function is called. In that case the python function may not be able to finish until the thunk is called. This should not result in deadlocks, because all py4cl functions can service callbacks while waiting for a result.
could you clarify what its saying cause I'm left guessing if you're saying CL must call thunk for any callback to be triggered at all. Is that it? Even when I don't care about the returned value and only async execution on the Python side?

Separate question that I can't answer having looked at py4cl code is why we call message_dispatch_loop() in multiple places. Once when we start the process, but then also every time LispCallbackObject instance is called. Aren't we starting multiple infinite loops without ever breaking earlier ones?

How would a possible mitigating strategy look? CL is the main driver, perhaps the answer then is to start multiple Python processes: one deals with some external events, another is used for synchronous interaction between CL and Python? What would be the way to do it. Doesn't look like the code accommodates it atm. Is there some clever CL way to make *python* thread-local and avoid modifying the rest of py4cl code?

Thank you very much

PS: I looked at @digikar99 fork and it looks similar re the above

Shubhamkar Ayare · Answer 1 · Sun Apr 21 2024 02:56:54 GMT+0800 (China Standard Time)

Okay, I wish conversations could be converted to graphs. This might go in a few different directions!

This maybe mitigated by GIL, but I don't know enough Python internals to make that call, tbh.

I don't think we need to worry about GIL for py4cl/2. It might only be necessary for py4cl2-cffi.

Could we get unlucky and possibly clobber stdout with two different writers - one from the main thread's message loop handling a command and our callback?

This was my first guess. If I understand correctly, a similar situation occurred while calling py4cl/2 from multiple lisp threads. This should be fixed in py4cl2 due to a recursive lock held by raw-py. I think this is a backward compatible fix and can be ported to py4cl.

On py4cl2, using multiple python threads does actually clobber the streams. The following wreaks havoc:

(in-package :py4cl2)

(export-function #'identity "identity")

(raw-pyexec "
import threading
import time
import sys

def send_sleep_repeat(n, obj):
  for _ in range(n):
    print(identity(obj))
    sys.stdout.flush()
")

(raw-pyexec "
threads = [threading.Thread(
  target=send_sleep_repeat,
  args=(5, \"hello from thread {}\".format(i))
) for i in range(2)] 
")

(raw-pyexec "for th in threads: th.start()")

On both py4cl2 and py4cl:

(in-package :py4cl)

(export-function #'identity "identity")

(python-exec "
import threading
import time
import sys

def send_sleep_repeat(n, obj):
  for _ in range(n):
    print(identity(obj))
    sys.stdout.flush()
")

(python-exec "
threads = [threading.Thread(
  target=send_sleep_repeat,
  args=(20, \"hello from thread {}\".format(i))
) for i in range(4)] 
")

(python-exec "for th in threads: th.start()")

could you clarify what its saying cause I'm left guessing if you're saying CL must call thunk for any callback to be triggered at all. Is that it? Even when I don't care about the returned value and only async execution on the Python side?

I'm afraid I won't be able to clarify much, but here's something I found interesting:

PY4CL> (export-function #'identity "identity")
NIL
PY4CL> (python-call "lambda x : identity(x)" 42) ; works as expected
42
PY4CL> (let ((async1 (python-call-async "lambda x : identity(x)" 42))
             (async2 (python-call-async "lambda x : identity(x)" 23)))
         (print (list (funcall async1)
                      (funcall async2))))
; I wasn't expecting this to error, but okay.
; Evaluation aborted on #<PY4CL:PYTHON-ERROR {1007DF56A3}>.
PY4CL> (let ((async1 (python-call-async "lambda x : identity(x)" 42))
             (async2 (python-call-async "lambda x : identity(x)" 23)))
         (python-call "str" "wow")
         (print (list (funcall async2)
                      (funcall async1))))
; Hmm, this worked.
(42 23)
(42 23)

So, apparently, what that part means is that suppose the async python call requires a call to lisp. In that case, before calling the thinks returned by (python-call-async ...), one should do a non-async py4cl call.

Separate question that I can't answer having looked at py4cl code is why we call message_dispatch_loop() in multiple places. Once when we start the process, but then also every time LispCallbackObject instance is called.

The message_dispatch_loop is essentially waiting for lisp to send a message. What happens during the lisp callback is:

LispCallbackObject.__call__ writes 'c' to lisp and some more things. It calls message_dispatch_loop, and as you pointed correctly, it enters into a infinite loop. Well, a can-be-infinite loop, because there are various ways to break from it.
py4cl::dispatch-messages receives 'c' and eventually writes 'r' to python through dispatch-reply.
Python eventually receives 'r' in the message_dispatch_loop and it reads the value from lisp using recv_value and returns. Thus, the can-be-infinite loop has been exited.
The return value of the just existed message_dispatch_loop also becomes the return value of LispCallbackObject.__call__.

Aren't we starting multiple infinite loops without ever breaking earlier ones?

So, yes, we are breaking the earlier ones. There are three non-toplevel calls to message_dispatch_loop, and by analyzing the pathway for each of them, one can see that each of them results in a lisp call to py4cl::dispatch-reply at least when there are no errors. Thus, all the three of them exit.

How would a possible mitigating strategy look?

I suppose you are asking for the solution to the "Multithreading in python can clobber the output streams" problem. My first guess has been to introduce a lock before every call to lisp. But I might be running into some nitty-gritty issues. Hoping to figure them out in a few days!

I might be missing something as I don't see the need for multiple python processes, unless you want to set up a python process pool (with an equivalent increase in memory requirements).

Hoping that clarifies! Feel free to ping if there are more issues.

Shubhamkar Ayare · Answer 2 · Thu Apr 25 2024 04:15:19 GMT+0800 (China Standard Time)

I have added tests for this in py4cl2-tests and py4cl2 itself passes these tests now.