Starvation situations with fixed thread pools

Question

Starvation situations with fixed thread pools

nrinaudo opened this issue 5 years ago · comments

Parallel collections that use a fixed thread pool for task support and contain "too many" elements will deadlock, where "too many" is a value I haven't been able to qualify.

The simplest possible reproduction is the empty list with a thread pool of 1:

import scala.concurrent.ExecutionContext
import java.util.concurrent.Executors
import scala.collection.parallel.CollectionConverters._
import scala.collection.parallel.ExecutionContextTaskSupport

val col = List.empty[Int].par

col.tasksupport = new ExecutionContextTaskSupport(
  ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(1))
)

// This will deadlock
col.map(_ + 1)

I have observed the same starvation issue with larger thread pools and lists, but not reliably enough to provide a reproduction case.

Seth Tisue · Answer 1 · Wed Feb 06 2019 00:51:44 GMT+0800 (China Standard Time)

@axel22 should https://docs.scala-lang.org/overviews/parallel-collections/configuration.html say something about this?

Aleksandar Prokopec · Answer 2 · Wed Feb 06 2019 00:53:15 GMT+0800 (China Standard Time)

Maybe we should add a recommendation to use only ForkJoinPools to that documentation.

Nicolas Rinaudo · Answer 3 · Wed Feb 06 2019 01:10:24 GMT+0800 (China Standard Time)

Isn’t that hiding the symptoms though? Sure, it’s best to use a ForkJoinPool, but we still have perfectly legal code that deadlocks

Matthew Rooney · Answer 4 · Wed Feb 06 2019 18:24:29 GMT+0800 (China Standard Time)

This seems to stem from the implementation of

scala.collection.parallel.FutureTasks vs the implementation of scala.collection.parallel.ParIterableLike.ResultMapping

The issue is that

The FutureTasks logic first determines the parallelism based on the number of cores on the machine (in my case this was 6).
It then creates a computation tree based on the max depth calculated by this:
private val maxdepth = (math.log(parallelismLevel) / math.log(2) + 1).toInt
This will create up to 2^maxDepth Futures (based on the number of elements in the collection)
Each of this futures calls scala.collection.parallel.Task#tryLeaf
The tryLeaf method eventually calls through to scala.collection.parallel.Task#leaf. The implementation of leaf for ResultMapping scala.collection.parallel.ParIterableLike.ResultMapping#leaf makes a blocking call to tasksupport.executeAndWaitResult(inner)
In the case of a single thread, this means that the thread that is waiting for the task result is blocked because there is no thread available to process the scala.collection.parallel.ParIterableLike.StrictSplitterCheckTask

In theory, if the number of threads you define for your executor is:
(2 ^ (log(parallelismLevel) / log(2) + 1).toInt) + 1 then it shouldn't get deadlocked.... i think

Stack trace from an experiment.

java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:242)
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:258)
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263)
scala.concurrent.Await$.$anonfun$result$1(package.scala:219)
scala.concurrent.Await$$$Lambda$1274/950805155.apply(Unknown Source)
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:57)
scala.concurrent.Await$.result(package.scala:146)
scala.collection.parallel.FutureTasks.$anonfun$execute$3(Tasks.scala:513)
scala.collection.parallel.FutureTasks$$Lambda$1256/656809460.apply(Unknown Source)
scala.collection.parallel.FutureTasks.executeAndWaitResult(Tasks.scala:519)
scala.collection.parallel.ExecutionContextTasks.executeAndWaitResult(Tasks.scala:555)
scala.collection.parallel.ExecutionContextTasks.executeAndWaitResult$(Tasks.scala:555)
scala.collection.parallel.ExecutionContextTaskSupport.executeAndWaitResult(TaskSupport.scala:84)
scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:960)
scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
scala.collection.parallel.Task$$Lambda$1257/1697135480.apply$mcV$sp(Unknown Source)
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
scala.collection.parallel.ParIterableLike$ResultMapping.tryLeaf(ParIterableLike.scala:955)
scala.collection.parallel.FutureTasks.$anonfun$exec$5(Tasks.scala:499)
scala.collection.parallel.FutureTasks$$Lambda$1253/1941666034.apply(Unknown Source)
scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
scala.collection.parallel.FutureTasks$$Lambda$1254/1629179184.apply(Unknown Source)
scala.util.Success.$anonfun$map$1(Try.scala:255)
scala.util.Success.map(Try.scala:213)
scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
scala.concurrent.Future$$Lambda$1205/1115425820.apply(Unknown Source)
scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
scala.concurrent.impl.Promise$$Lambda$1206/183285557.apply(Unknown Source)
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

amitainz · Answer 5 · Thu Mar 12 2020 04:07:20 GMT+0800 (China Standard Time)

I've also encountered this problem. Perhaps the following observation will provide a hint to someone: in Scala 2.12 applying a map to some parallel vector worked in the REPL but consistently failed when the same command was inside an object defined in a script. The same map seems to consistently fail in 2.13 regardless of whether I try to run it in the REPL or the script.

amitainz · Answer 6 · Thu Mar 12 2020 04:10:29 GMT+0800 (China Standard Time)

@javax-swing, @axel22 for me it renders the whole parallel collections module useless, so I'm quite surprised this thread seems inactive for over a year - am I missing something? is there some secret workaround? I'm really looking forward to using parallel collections in my code.

Ichoran · Answer 7 · Thu Mar 12 2020 12:44:07 GMT+0800 (China Standard Time)

@amitainz - You could try to transition to using Java 8 Streams. Although they don't support every possible collection, they generally perform quite well, and there are adapters for the major Scala collections. See https://www.scala-lang.org/api/current/scala/jdk/StreamConverters$.html

amitainz · Answer 8 · Thu Mar 12 2020 15:35:57 GMT+0800 (China Standard Time)

@amitainz - You could try to transition to using Java 8 Streams. Although they don't support every possible collection, they generally perform quite well, and there are adapters for the major Scala collections. See https://www.scala-lang.org/api/current/scala/jdk/StreamConverters$.html

Thanks @Ichoran , I'll take a look.

amitainz · Answer 9 · Mon Mar 16 2020 23:32:50 GMT+0800 (China Standard Time)

At least in Scala 2.12.10, I managed to make things work by making my object extend the App trait. So that when I run: Snippets.main(Array[String]()) the code runs as expected. Note if I try to access the ParVector instance inside Snippets and apply map to it on the sbt REPL, it hangs. Still, I thought this might be useful for other users / provide some clue for any developer who wants to fix this bug.

Noah Zucker · Answer 10 · Sun Apr 19 2020 13:32:24 GMT+0800 (China Standard Time)

I as well to my chagrin have discovered that previously working code that worked famously in 2.10 and 2.11 for almost a decade hangs in scala 2.12.

Should we completely abandon our code that uses parallel collections in our effort to get on 2.12? And re-write it to use simple futures? Seems like yes

(You might ask "why still on 2.12 in 2020?" Yes, well - it's issues like this - causing mysterious deadlocks that take down our application - that put our 2.12 upgrade effort on hold for months after we burn through our timeboxed effort to upgrade)

amitainz · Answer 11 · Sun Apr 19 2020 14:26:21 GMT+0800 (China Standard Time)

@noahlz I'm a newbie to Scala, so probably should keep out of such strategic discussions. But for what it's worth, the parallel collections were the number one reason I decided to give scala a try. I mean, the language is elegant and expressive, sure - but I was looking for something that will actually save me worrying about the parallelization. The second reason was that it is reasonably fast (I got tired of looking for inventive numpy hacks to do what I want in python), but there are plenty of other languages in that category.

Seth Tisue · Answer 12 · Sun Apr 19 2020 23:45:59 GMT+0800 (China Standard Time)

Do we actually know whether @nrinaudo's original report is 2.12+-only?

I don't know for sure, but I suspect this ticket has become a grab bag of "parallel collections didn't work for me, in my code" reports that may or may not have anything to do with each other.

In particular, I suspect overlap with scala/bug#8119

amitainz · Answer 13 · Mon Apr 20 2020 00:19:27 GMT+0800 (China Standard Time)

@SethTisue - it certainly isn't just a problem in 2.12. I don't remember all the scala versions I've tried, but 2.13 definitely has this / a very similar problem, as well as some older versions. You're probably right that a thorough check of all the versions and mapping of this bug is called for, but it does seem that all these issues stem from the same type of deadlocking behavior.

I wouldn't categorize them as "parallel collections didn't work for me in my code". It's a very specific deadlocking issue that's very easy to reproduce, occurs in many versions and seems to be "well known" in the scala community. I don't think it makes sense to close the only (right?) active thread addressing this issue (I see the bug report on 2.13 that you referenced above was closed).

Seth Tisue · Answer 14 · Mon Apr 20 2020 00:33:09 GMT+0800 (China Standard Time)

it certainly isn't just a problem in 2.12. I don't remember all the scala versions I've tried, but 2.13 definitely has this / a very similar problem,

I know. That's why I said "2.12+", not just "2.12". There is a crucial dividing line between 2.11 and 2.12, because the lambda encoding changed. (There is also an important dividing line between 2.12 and 2.13, namely the collections overhaul, and the relocation of the parallel collections to an external module.)

I don't think it makes sense to close the only (right?) active thread addressing this issue (I see the bug report on 2.13 that you referenced above was closed).

. It's a very specific deadlocking issue that's very easy to reproduce, occurs in many versions and seems to be "well known" in the scala community. I don't think it makes sense to close the only (right?) active thread addressing this issue (I see the bug report on 2.13 that you referenced above was closed).

The issue you seem to be referring to is scala/bug#8119 and there doesn't need to be more than one ticket on it.

That ticket is closed because it's already fixed for 2.13.2, which may ship as soon as this week. In the meantime, you can try out the fix in a nightly: https://stackoverflow.com/questions/40622878/how-do-i-tell-sbt-to-use-a-nightly-build-of-scala-2-12-or-2-13

This ticket — @nrinaudo's original report, above — is about a different problem.

It is important for all of us to be maximally precise about exactly which problems exist in exactly which Scala versions and how exactly those problems can be reproduced.

Noah Zucker · Answer 15 · Mon Apr 20 2020 00:45:02 GMT+0800 (China Standard Time)

👍 this is not a case of "parallel collections didn't work for me in my code" see also scala/bug#8955 this has come up before and seems not fixed.

Example code

 $ console
[info] Starting scala interpreter...
[info]
Welcome to Scala 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191).
Type in expressions for evaluation. Or try :help.

scala> (1 to 1000).toList.par.foreach(i=> println(Thread.currentThread.getName + s" = $i"))

hangs

In 2.11

$ +console
Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191).
Type in expressions for evaluation. Or try :help.

scala> (1 to 1000).toList.par.foreach(i=> println(Thread.currentThread.getName + s" = $i"))
ForkJoinPool-1-worker-7 = 251
ForkJoinPool-1-worker-13 = 1
ForkJoinPool-1-worker-13 = 2
ForkJoinPool-1-worker-13 = 3
ForkJoinPool-1-worker-13 = 4
... etc

Like I said, I'm usually met with disbelief when I say our team is still on 2.11 but it's things like this hold us up, a real shame. Fortunately, we just have to rip out parallel collections from our code and replace with a Thread Pool Executor execution context + Futures.

Seth Tisue · Answer 16 · Mon Apr 20 2020 00:52:29 GMT+0800 (China Standard Time)

@noahlz scala/bug#8955 was fixed a long, long time ago. if you're going to bring that up, you need to provide specific evidence that it isn't fixed

the transcript you've included demonstrates a different bug, namely scala/bug#8119. if you try a 2.13 nightly, as I suggested above, you'll find that it is fixed there, for the upcoming 2.13.2 release

none of this has anything to do with this ticket, Nicolas's bug report about thread pools — at least, I don't think it does and nobody has presented evidence that it does

Noah Zucker · Answer 17 · Mon Apr 20 2020 11:31:16 GMT+0800 (China Standard Time)

@SethTisue thank you for clarifing! 👍

amitainz · Answer 18 · Mon Apr 20 2020 19:44:21 GMT+0800 (China Standard Time)

Ok, I didn't realize that bug was fixed - and thanks for the nightly tip. (for now I have some setup that works as long as I don't try to work with parallel collections in the REPL, so I'll just wait for the official rollout).

…

On Sun, Apr 19, 2020 at 7:33 PM Seth Tisue ***@***.***> wrote: it certainly isn't just a problem in 2.12. I don't remember all the scala versions I've tried, but 2.13 definitely has this / a very similar problem, That's why I said "2.12+". There is a crucial dividing line between 2.11 and 2.12, because the lambda encoding changed. I don't think it makes sense to close the only (right?) active thread addressing this issue (I see the bug report on 2.13 that you referenced above was closed). . It's a very specific deadlocking issue that's very easy to reproduce, occurs in many versions and seems to be "well known" in the scala community. I don't think it makes sense to close the only (right?) active thread addressing this issue (I see the bug report on 2.13 that you referenced above was closed). The issue you're referring to is scala/bug#8119 <scala/bug#8119> and there doesn't need to be more than one ticket on it. That ticket is closed because it's already fixed for 2.13.2, which may ship as soon as this week. In the meantime, you can try out the fix in a nightly: https://stackoverflow.com/questions/40622878/how-do-i-tell-sbt-to-use-a-nightly-build-of-scala-2-12-or-2-13 This ticket — @nrinaudo <https://github.com/nrinaudo>'s original report, above — is about a different problem. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI37NW7TMRXHLMGRLGFUACLRNMRVFANCNFSM4GUOOW4Q> .

amitainz · Answer 19 · Tue Apr 21 2020 20:42:24 GMT+0800 (China Standard Time)

Not sure where to put this @SethTisue, but another "hint" for bug hunters is that the same behavior is occurring when trying to work with java.util.concurrent.ConcurrentHashMap, but only when the parallelism level is actually > 1.
I'm using
scalaVersion := "2.12.10"

in the REPL (IntelliJ console) the last call, where there's actually supposed to be some parallelism, hangs:

scala> val x = new ConcurrentHashMap[Int,Int](10000000)
x: java.util.concurrent.ConcurrentHashMap[Int,Int] = {}

scala> myTimer(for (i <- 1 to 5000000) x.put(i,i*i), "filling") 
starting evaluation of filling...
done in 0.84584608s

scala> 
myTimer(x.forEach(100000000, (k,v) => (k*math.sin(v)*math.cos(k)).toInt), "test")
starting evaluation of test...
done in 4.043585024s

scala> 
myTimer(x.forEach(100000, (k,v) => (k*math.sin(v)*math.cos(k)).toInt), "test")
starting evaluation of test...

A very similar code wrapped inside a class extending App works (with a noticable 3x speedup on my 4 core Mac for the parallel execution).

Seth Tisue · Answer 20 · Tue Apr 21 2020 21:05:10 GMT+0800 (China Standard Time)

yeah, scala/bug#8119 isn't actually specific to the parallel collections, the parallel collections just happen to be the most common way that people run into it. your ConcurrentHashMap example is another, and it doesn't hang the 2.13.2 REPL.

Seth Tisue · Answer 21 · Mon Nov 15 2021 05:28:19 GMT+0800 (China Standard Time)

I have hidden many comments which turned out to be unrelated to the original bug report.

Before commenting on this ticket, please be very sure that what you are encountering is exactly the specific issue that Nicolas has identified, rather than just any starvation or deadlock issue.