http4s / blaze

Blazing fast NIO microframework and Http Parser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Blaze Test Is Unreliable

isomarcte opened this issue · comments

https://github.com/http4s/http4s/pull/5046/checks?check_run_id=3275323831

 ==> X org.http4s.client.blaze.BlazeClient213Suite.behave and not deadlock on failures with parTraverse  50.012s java.util.concurrent.TimeoutException: Future timed out after [50 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.tryAwait0(Promise.scala:212)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:225)
    at scala.concurrent.Await$.$anonfun$result$1(package.scala:201)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:62)
    at scala.concurrent.Await$.result(package.scala:124)
    at munit.internal.PlatformCompat$.$anonfun$waitAtMost$1(PlatformCompat.scala:21)
    at scala.util.Try$.apply(Try.scala:210)
    at munit.internal.PlatformCompat$.waitAtMost(PlatformCompat.scala:21)
    at munit.FunSuite.waitForCompletion(FunSuite.scala:51)
    at munit.FunSuite.$anonfun$test$1(FunSuite.scala:37)
    at munit.GenericTest.$anonfun$withBodyMap$1(GenericTest.scala:33)
    at munit.MUnitRunner.$anonfun$runTestBody$1(MUnitRunner.scala:296)

If we can't fix this, we might consider disabling it. I'm unsure how much utility a test which is expected to fail often provides.

This sounds to me like a legitimate bug in the blaze-client.

It might be, but sometimes we find the servers we spin up lock up in the CI environment buckle under load. We made a lot of progress on test reliability by eliminating unsafeRunSync() across the test suites, but problems like this remain. Are these clients really locking up, or are they just not performing well on an overwhelmed machine?

I'm convinced it's either a poorly written test OR a bug and that it needs attention. I'm just not yet sure which.

I can reproduce the issue locally, which I think is a convincing evidence that this isn't a CI environment issue.

To reach (reasonably) reliable reproduction I commented out other tests in the suit and replaces .replicateA(5) with .replicateA(200). It fails in ~50% attempts. But when it passes, it passes in less than 10s.

[success] Total time: 4 s, completed Aug 30, 2021 4:39:35 PM
[IJ]blaze-client/testOnly org.http4s.client.blaze.BlazeClient213Suite
org.http4s.client.blaze.BlazeClient213Suite:
  + behave and not deadlock on failures with parTraverse 5.8s
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 7 s, completed Aug 30, 2021 4:39:50 PM
[IJ]blaze-client/testOnly org.http4s.client.blaze.BlazeClient213Suite
org.http4s.client.blaze.BlazeClient213Suite:
==> X org.http4s.client.blaze.BlazeClient213Suite.behave and not deadlock on failures with parTraverse  50.498s java.util.concurrent.TimeoutException: Future timed out after [50 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.tryAwait0(Promise.scala:212)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:225)
    at scala.concurrent.Await$.$anonfun$result$1(package.scala:201)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:62)
    at scala.concurrent.Await$.result(package.scala:124)
    at munit.internal.PlatformCompat$.$anonfun$waitAtMost$1(PlatformCompat.scala:21)
    at scala.util.Try$.apply(Try.scala:210)
    at munit.internal.PlatformCompat$.waitAtMost(PlatformCompat.scala:21)
    at munit.FunSuite.waitForCompletion(FunSuite.scala:51)
    at munit.FunSuite.$anonfun$test$1(FunSuite.scala:37)
    at munit.GenericTest.$anonfun$withBodyMap$1(GenericTest.scala:33)
    at munit.MUnitRunner.$anonfun$runTestBody$1(MUnitRunner.scala:296)
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] 	org.http4s.client.blaze.BlazeClient213Suite
[error] (blaze-client / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 52 s, completed Aug 30, 2021 4:40:49 PM
[IJ]blaze-client/testOnly org.http4s.client.blaze.BlazeClient213Suite
org.http4s.client.blaze.BlazeClient213Suite:
  + behave and not deadlock on failures with parTraverse 7.989s
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 9 s, completed Aug 30, 2021 4:41:12 PM
[IJ]blaze-client/testOnly org.http4s.client.blaze.BlazeClient213Suite
org.http4s.client.blaze.BlazeClient213Suite:
==> X org.http4s.client.blaze.BlazeClient213Suite.behave and not deadlock on failures with parTraverse  50.471s java.util.concurrent.TimeoutException: Future timed out after [50 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.tryAwait0(Promise.scala:212)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:225)
    at scala.concurrent.Await$.$anonfun$result$1(package.scala:201)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:62)
    at scala.concurrent.Await$.result(package.scala:124)
    at munit.internal.PlatformCompat$.$anonfun$waitAtMost$1(PlatformCompat.scala:21)
    at scala.util.Try$.apply(Try.scala:210)
    at munit.internal.PlatformCompat$.waitAtMost(PlatformCompat.scala:21)
    at munit.FunSuite.waitForCompletion(FunSuite.scala:51)
    at munit.FunSuite.$anonfun$test$1(FunSuite.scala:37)
    at munit.GenericTest.$anonfun$withBodyMap$1(GenericTest.scala:33)
    at munit.MUnitRunner.$anonfun$runTestBody$1(MUnitRunner.scala:296)
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error] 	org.http4s.client.blaze.BlazeClient213Suite
[error] (blaze-client / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 52 s, completed Aug 30, 2021 4:42:08 PM

Oh, interesting. I haven't tried in quite some time, but I've never been able to get it to lock up by adding load. Getting a repro on this is a huge step forward.

It appears BlazeClient leaks connections (they aren't returned to the connection pool) on unlucky cancellation. I will try fixing this later.

I'm closing this one as I think it's been fixed by http4s/http4s#5385