karatelabs / karate

Test Automation Made Simple

Home Page:https://karatelabs.github.io/karate

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Java type in karate-config with parallel runner causing error: Multi threaded access

ericdriggs opened this issue · comments

Given using config = karate.callSingle to define a Java.type variable
When execute parallel runner,
Then get reliable "Multi threaded access requested by thread" error

MVC repo (tests pass when run individually, failure when run in parallel):
https://github.com/ericdriggs/karate-parallel-runner-concurrent-access/

Seems related to oracle/graal#631

Think I can probably do a workaround by defining a javascript object and then defining individual functions on that javascript object to simulate static method interop, only doing Java.type inside the function call
Still, an interesting race condition, and a crying shame that graal has such a ridiculous default without a flag to turn it off.

thanks, this helps cc @joelpramos

@ericdriggs so - this is a won't fix. I've added this below to the documentation. given the way graal js behaves today there is no way we can get a java instance created on one thread to work across threads. anybody is welcome to find a way.

I've tweaked our existing parallel troublemaker test to replicate this if you comment out one line in the offending karate-config.js.

image

@ptrthomas far from me to say this solves everything (cause you know, Graal) but either of lines added in the catch in the screenshot below solves the specific reproduce unit test you added (either this.JS.context.asValue(value) or the commented line Value.asValue(value.asHostObject()), both work).

I think it's worth adding Value.asValue(value.asHostObject()) (avoids access to JS.context) on the catch and/or slightly modify the logic / catch method (and warn message).

image

@joelpramos that's a neat trick and I'm still not sure how it works, and glad I had the tests lined up. do review !

@ptrthomas
Still seeing this issue when using karate 1.1.0-RC1

ericdriggs/karate-parallel-runner-concurrent-access@81003b0

mvn clean test

...
[ERROR] Failures: 
[ERROR]   ParallelRunnerTest.testParallel:18 classpath:examples/parallel-runner-always-fails.feature:9
karate-config.js
Multi threaded access requested by thread Thread[pool-1-thread-1,5,main] but is not allowed for language(s) js.
.....

not planning to look at this immediately. any help welcome

@ericdriggs thanks a lot for your simple example. it really helped. I boiled it down further and it is here: https://github.com/intuit/karate/tree/e4f527a63e078af5d7394c0bafbc672ac2469c5d/karate-core/src/test/java/com/intuit/karate/core/parajava

cc @joelpramos @aleruz

so some explanation here for future reference.

Java.type('my.Clazz') evaluates to something which is a "host object" as well as a "meta object" in graal. I found that the only way to pass this from one context to another is to keep it "as is" whereas for everything else, we convert it. e.g. JSON to MAP / List and other things to Java primitive types. if we don't keep it "as is" - trying to call methods or new on this meta-thingy would fail

we were trying to convert this Graal Value thinking it was a host object. so now added an extra check and it seems to be good. we now "attach" it for callSingle() / recurseAndAttachAndDeepClone() - my guess is it does not need to be re-hydrated at all, whereas JS functions need to (we re-evaluate the source and create a brand-new JS function) - but for now we pass both into the ScenarioEngine.attach() function. - which tries to do a org.graalvm.polyglot.Context.asValue() - which I think is sufficient for the "meta object".

Tbh at this point I just trust we have a ton of unit tests around this. I’ll give it a shot later with a suite of tests on my end but can’t imagine the results will be different. I’ll try looking into the examples and see if we can break with other Java things.

There was one test was commented early on in one of the first v1 RCs, can we try that one too? Might be redundant now but just remembered.

@ptrthomas
Thanks for looking at this again! Nothing more frustrating that concurrency issues both to debug and for usage. Will verify on Monday.

Fix verified using latest version of develop branch.
Sample project passes along with my own library/unit tests.
Will upgrade my projects to karate 1.1.0 when next release candidate released.
Thanks!

@joelpramos I improved the fix - note that we don't global-lock for every call-feature like I attempted earlier

I have good reason to believe that the "detach" routine was the source of all problems.
also in the case of a "cache hit" for callonce and callsingle, we have to synchronize for all the JS re-hydration to work fine

and I think that should fix everything.

note that someone reported a problem with csv + scenario outline: https://stackoverflow.com/q/68041569/143475

so I added a test for that. this test also consistently replicates the "js thread" issue without the "fix"
but not the failed to parse csv error which I hope is just related to the js thread problem

so @aleruz do try again with this fix !

Going through the changes I think there are two locks that "worry me" (from a performance standpoint). Just spitballing here, have not fetch the latest to try out yet:

  • ScenarioEngine.recurseAndAttach() will be called if there are calls so Scenario Outline with high number of examples calling a reusable feature this will always lock for each call
  • ScenarioEngine.init() same as above but without the call i.e. will always happen for Scenario Outline
  • JsExecutable maybe we can rename to something else (e.g. JavaFunction or HostFunction to align to Graal terminology) cause I think it's an Executable that is NOT Js. In this case Java but who knows maybe some day you'll add more fancy stuff from Graal like R or Python

Overall I think it'll negatively impact the effective use of parallelism. Obviously when something doesn't work and that's the solution too bad lol

Since it seems that we nailed down the culprit (parallelism attaching/dettaching these functions when they came from a callonce) I wonder whether we can add a flag in the Suite class to identify that a lock is required (maybe if one of thse JsExecutable are created during a callonce and be explicit in that piece of documentation where you say to avoid functions that can also have a negative performance impact. Not sure whether the flag would be holistic or there are ways to think about a flag for the context of the execution of a (top level feature) Feature.

Similar note to the overall flag there - the warn logs will get lost in the middle of the execution f tests. I think an overall message should be printed with the results of the suite (that print() to the stdoutput that produces numbers) informing of potential issues. Just playing safe here I guess.

Going through this the parallel message might be for the attaching/dettaching the JS stuff only - https://www.graalvm.org/reference-manual/js/Multithreading/

I think I just keep getting side tracked cause there are two problems - the Java functions usage and the multi threading. So the comment on the flag would still apply just not sure how easy / clean it is to do (maybe for the existance of any function, JS or not)

@joelpramos the second point (init()) is no longer going to lock with the latest changes as far as I can tell - but you can confirm

the rest - you are welcome to prove there is a significant impact (I think not, all the processing is in-memory, and based on the build run-time) and make changes. this has not been easy to work on

Like you I'm not too fussed about it. The one on the init() raised more one of my eyebrows. I think the other logging / naming comments are still valid.

Hopefully the battle is over.

@joelpramos tried to remove the sync on recurseAndAttach() but couldn't. so over to you now :) I'm done for today

tried a few more things as you can see in the prev commit

a tip: running this can replicate the problem easily on local

mvn test -f karate-core/pom.xml -Dtest=Parallel*Test

Thanks for your continued work on this.

Thanks for your continued work on this.

indeed!

@ptrthomas I added a few comments in the commits but it's mostly to wrap my head around the changes you are doing so that when I have some time to fetch the code and iterate on this I'm not totally lost

just released 1.1.0.RC4 - I request everyone to re-test this once cc @aleruz

Pass - Validated on RC4 for MVC project and my own projects.

all please see this comment on the other related issue. it is possible to use java functions (not just classes) safely and work around some of the graal limitations: #1633 (comment)

@ptrthomas
My sample project is still passing unchanged on both RC4 and a local build of the latest develop.
Can you please clarify if the function recommendation is recommended means of java interop in callOnce? The additional boilerplate isn't pretty since it requires duplicate interface at both the class and js level but if that's what's required I understand.

@ericdriggs reco is only if you

a) use java functions ("use" means you actually call the function in parallel scenarios)
b) and callonce or callsingle
c) you really want to "save some typing" by referring to functions instead of passing a Java.type() around and using that at point-of-need

I also feel the extra code is only in the Java side. the JS side is the same one or two lines