Java type in karate-config with parallel runner causing error: Multi threaded access

Question

Java type in karate-config with parallel runner causing error: Multi threaded access

ericdriggs opened this issue 3 years ago · comments

Given using config = karate.callSingle to define a Java.type variable
When execute parallel runner,
Then get reliable "Multi threaded access requested by thread" error

MVC repo (tests pass when run individually, failure when run in parallel):
https://github.com/ericdriggs/karate-parallel-runner-concurrent-access/

Seems related to oracle/graal#631

Eric Driggs · Answer 1 · Thu Apr 15 2021 04:39:27 GMT+0800 (China Standard Time)

Think I can probably do a workaround by defining a javascript object and then defining individual functions on that javascript object to simulate static method interop, only doing Java.type inside the function call
Still, an interesting race condition, and a crying shame that graal has such a ridiculous default without a flag to turn it off.

Peter Thomas · Answer 2 · Thu Apr 15 2021 10:02:44 GMT+0800 (China Standard Time)

thanks, this helps cc @joelpramos

Peter Thomas · Answer 3 · Fri Apr 16 2021 23:27:46 GMT+0800 (China Standard Time)

@ericdriggs so - this is a won't fix. I've added this below to the documentation. given the way graal js behaves today there is no way we can get a java instance created on one thread to work across threads. anybody is welcome to find a way.

I've tweaked our existing parallel troublemaker test to replicate this if you comment out one line in the offending karate-config.js.

Joel Ramos · Answer 4 · Sun Apr 18 2021 21:47:51 GMT+0800 (China Standard Time)

@ptrthomas far from me to say this solves everything (cause you know, Graal) but either of lines added in the catch in the screenshot below solves the specific reproduce unit test you added (either this.JS.context.asValue(value) or the commented line Value.asValue(value.asHostObject()), both work).

I think it's worth adding Value.asValue(value.asHostObject()) (avoids access to JS.context) on the catch and/or slightly modify the logic / catch method (and warn message).

Peter Thomas · Answer 5 · Sun Apr 18 2021 22:23:35 GMT+0800 (China Standard Time)

@joelpramos that's a neat trick and I'm still not sure how it works, and glad I had the tests lined up. do review !

Eric Driggs · Answer 6 · Tue May 18 2021 06:35:01 GMT+0800 (China Standard Time)

@ptrthomas
Still seeing this issue when using karate 1.1.0-RC1

ericdriggs/karate-parallel-runner-concurrent-access@81003b0

mvn clean test

...
[ERROR] Failures: 
[ERROR]   ParallelRunnerTest.testParallel:18 classpath:examples/parallel-runner-always-fails.feature:9
karate-config.js
Multi threaded access requested by thread Thread[pool-1-thread-1,5,main] but is not allowed for language(s) js.
.....

Peter Thomas · Answer 7 · Tue May 18 2021 11:21:29 GMT+0800 (China Standard Time)

not planning to look at this immediately. any help welcome

Peter Thomas · Answer 8 · Sun May 23 2021 19:42:25 GMT+0800 (China Standard Time)

@ericdriggs thanks a lot for your simple example. it really helped. I boiled it down further and it is here: https://github.com/intuit/karate/tree/e4f527a63e078af5d7394c0bafbc672ac2469c5d/karate-core/src/test/java/com/intuit/karate/core/parajava

cc @joelpramos @aleruz

so some explanation here for future reference.

Java.type('my.Clazz') evaluates to something which is a "host object" as well as a "meta object" in graal. I found that the only way to pass this from one context to another is to keep it "as is" whereas for everything else, we convert it. e.g. JSON to MAP / List and other things to Java primitive types. if we don't keep it "as is" - trying to call methods or new on this meta-thingy would fail

we were trying to convert this Graal Value thinking it was a host object. so now added an extra check and it seems to be good. we now "attach" it for callSingle() / recurseAndAttachAndDeepClone() - my guess is it does not need to be re-hydrated at all, whereas JS functions need to (we re-evaluate the source and create a brand-new JS function) - but for now we pass both into the ScenarioEngine.attach() function. - which tries to do a org.graalvm.polyglot.Context.asValue() - which I think is sufficient for the "meta object".

Joel Ramos · Answer 9 · Sun May 23 2021 21:14:04 GMT+0800 (China Standard Time)

Tbh at this point I just trust we have a ton of unit tests around this. I’ll give it a shot later with a suite of tests on my end but can’t imagine the results will be different. I’ll try looking into the examples and see if we can break with other Java things.

There was one test was commented early on in one of the first v1 RCs, can we try that one too? Might be redundant now but just remembered.

Eric Driggs · Answer 10 · Mon May 24 2021 01:04:30 GMT+0800 (China Standard Time)

@ptrthomas
Thanks for looking at this again! Nothing more frustrating that concurrency issues both to debug and for usage. Will verify on Monday.

Eric Driggs · Answer 11 · Tue May 25 2021 00:37:48 GMT+0800 (China Standard Time)

Fix verified using latest version of develop branch.
Sample project passes along with my own library/unit tests.
Will upgrade my projects to karate 1.1.0 when next release candidate released.
Thanks!

Peter Thomas · Answer 12 · Sat Jun 19 2021 13:51:35 GMT+0800 (China Standard Time)

@joelpramos I improved the fix - note that we don't global-lock for every call-feature like I attempted earlier

I have good reason to believe that the "detach" routine was the source of all problems.
also in the case of a "cache hit" for callonce and callsingle, we have to synchronize for all the JS re-hydration to work fine

and I think that should fix everything.

note that someone reported a problem with csv + scenario outline: https://stackoverflow.com/q/68041569/143475

so I added a test for that. this test also consistently replicates the "js thread" issue without the "fix"
but not the failed to parse csv error which I hope is just related to the js thread problem

so @aleruz do try again with this fix !

Joel Ramos · Answer 13 · Sat Jun 19 2021 18:57:50 GMT+0800 (China Standard Time)

Going through the changes I think there are two locks that "worry me" (from a performance standpoint). Just spitballing here, have not fetch the latest to try out yet:

ScenarioEngine.recurseAndAttach() will be called if there are calls so Scenario Outline with high number of examples calling a reusable feature this will always lock for each call
ScenarioEngine.init() same as above but without the call i.e. will always happen for Scenario Outline
JsExecutable maybe we can rename to something else (e.g. JavaFunction or HostFunction to align to Graal terminology) cause I think it's an Executable that is NOT Js. In this case Java but who knows maybe some day you'll add more fancy stuff from Graal like R or Python

Overall I think it'll negatively impact the effective use of parallelism. Obviously when something doesn't work and that's the solution too bad lol

Since it seems that we nailed down the culprit (parallelism attaching/dettaching these functions when they came from a callonce) I wonder whether we can add a flag in the Suite class to identify that a lock is required (maybe if one of thse JsExecutable are created during a callonce and be explicit in that piece of documentation where you say to avoid functions that can also have a negative performance impact. Not sure whether the flag would be holistic or there are ways to think about a flag for the context of the execution of a (top level feature) Feature.

Similar note to the overall flag there - the warn logs will get lost in the middle of the execution f tests. I think an overall message should be printed with the results of the suite (that print() to the stdoutput that produces numbers) informing of potential issues. Just playing safe here I guess.

Joel Ramos · Answer 14 · Sat Jun 19 2021 19:02:51 GMT+0800 (China Standard Time)

Going through this the parallel message might be for the attaching/dettaching the JS stuff only - https://www.graalvm.org/reference-manual/js/Multithreading/

I think I just keep getting side tracked cause there are two problems - the Java functions usage and the multi threading. So the comment on the flag would still apply just not sure how easy / clean it is to do (maybe for the existance of any function, JS or not)

Peter Thomas · Answer 15 · Sat Jun 19 2021 19:04:41 GMT+0800 (China Standard Time)

@joelpramos the second point (init()) is no longer going to lock with the latest changes as far as I can tell - but you can confirm

the rest - you are welcome to prove there is a significant impact (I think not, all the processing is in-memory, and based on the build run-time) and make changes. this has not been easy to work on

Joel Ramos · Answer 16 · Sat Jun 19 2021 19:09:34 GMT+0800 (China Standard Time)

Like you I'm not too fussed about it. The one on the init() raised more one of my eyebrows. I think the other logging / naming comments are still valid.

Hopefully the battle is over.

Peter Thomas · Answer 17 · Sat Jun 19 2021 20:48:31 GMT+0800 (China Standard Time)

@joelpramos tried to remove the sync on recurseAndAttach() but couldn't. so over to you now :) I'm done for today

tried a few more things as you can see in the prev commit

a tip: running this can replicate the problem easily on local

mvn test -f karate-core/pom.xml -Dtest=Parallel*Test

Eric Driggs · Answer 18 · Tue Jun 22 2021 04:10:43 GMT+0800 (China Standard Time)

Thanks for your continued work on this.

Joel Ramos · Answer 19 · Tue Jun 22 2021 06:18:36 GMT+0800 (China Standard Time)

Thanks for your continued work on this.

indeed!

@ptrthomas I added a few comments in the commits but it's mostly to wrap my head around the changes you are doing so that when I have some time to fetch the code and iterate on this I'm not totally lost

Peter Thomas · Answer 20 · Sun Jun 27 2021 23:19:11 GMT+0800 (China Standard Time)

just released 1.1.0.RC4 - I request everyone to re-test this once cc @aleruz

Eric Driggs · Answer 21 · Wed Jun 30 2021 07:06:11 GMT+0800 (China Standard Time)

Pass - Validated on RC4 for MVC project and my own projects.

Peter Thomas · Answer 22 · Tue Jul 06 2021 23:23:34 GMT+0800 (China Standard Time)

all please see this comment on the other related issue. it is possible to use java functions (not just classes) safely and work around some of the graal limitations: #1633 (comment)

Eric Driggs · Answer 23 · Thu Jul 08 2021 02:54:17 GMT+0800 (China Standard Time)

@ptrthomas
My sample project is still passing unchanged on both RC4 and a local build of the latest develop.
Can you please clarify if the function recommendation is recommended means of java interop in callOnce? The additional boilerplate isn't pretty since it requires duplicate interface at both the class and js level but if that's what's required I understand.

Peter Thomas · Answer 24 · Thu Jul 08 2021 10:13:45 GMT+0800 (China Standard Time)

@ericdriggs reco is only if you

a) use java functions ("use" means you actually call the function in parallel scenarios)
b) and callonce or callsingle
c) you really want to "save some typing" by referring to functions instead of passing a Java.type() around and using that at point-of-need

I also feel the extra code is only in the Java side. the JS side is the same one or two lines