reactor / reactor-core

Non-Blocking Reactive Foundation for the JVM

Home Page:http://projectreactor.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Empty hot source hangs with 2nd late subscriber

RonBarkan opened this issue · comments

A Flux pipeline hangs when a 2nd late subscriber uses a connected hot source.

Reproduction:

  @Test
  void hotSource() throws Exception {
  // flux should be an empty  ConnectableFlux 
    var flux = Flux.just(1, 2, 3, 4)
        // .publishOn(Schedulers.newBoundedElastic(5, 5, "blah"))
        .flatMap(i -> Mono.<Integer>empty())  // in real code, this depends on the actual data
        .map(i -> i + 10)
        .doOnComplete(() -> System.out.println("first part complete"))
        .publish();

   // map should be a Mono of an empty map
    var map = flux
        .collectMap(Function.identity(), Object::toString)
        .doOnSuccess(m -> System.out.println("flux completed " + m));

    CountDownLatch latch = new CountDownLatch(1);
    var count = map.flatMapMany(
        m -> flux
            // .publishOn(Schedulers.boundedElastic())
            .doOnSubscribe(s -> System.out.println("flatMapMany subscribe"))
            .doOnRequest(r -> System.out.println("Requesting " + r))
            // Not printed from here
            .doOnComplete(() -> System.out.println("flatMapMany completed"))
            .doOnNext(i -> System.out.println("i: " + i))
            .map(Object::toString))
        .doOnNext(i -> System.out.println("after i: " + i))
        .buffer(100)
        .doOnNext(l -> System.out.println("b: " + l))
        .count()
        .subscribe(l -> System.out.println("result: " + l), e -> System.out.println("exception: " + e), () -> {
          System.out.println("Completed!");  // (1)
          latch.countDown();
        });
    flux.connect();
    System.out.println("Awaiting"); // printed
    latch.await();
    System.out.println("Test done");
  }

Expected Behavior

The pipeline should complete, in particular, the test should terminate and (1) should print.

Actual Behavior

The test hangs, (1) is not executed.
However, the map Mono is completed as an empty map.

Here's the output

Awaiting
first part complete
flux completed {}
flatMapMany subscribe
Requesting 9223372036854775807

The test hangs here.

Steps to Reproduce

Run the test.

Your Environment

Java 21/WSL2:Ubuntu Linux

  • Reactor version(s) used: 3.5.11
  • Other relevant libraries versions (eg. netty, ...): N/A
  • JVM version (java -version): 21
  • OS and version (eg uname -a): 5.15.133.1-microsoft-standard-WSL2 #1 SMP Thu Oct 5 21:02:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Hi, @RonBarkan 👋

Please consider the following:

@Test
void lateSuscribe() throws Exception {
	CountDownLatch latch = new CountDownLatch(1);

	ConnectableFlux<Integer> publish = Flux.just(1)
	                                       .publish();

	publish.subscribe(
			r -> System.out.println("1.Next: " + r), 
			e -> System.out.println("1.Error " + e),
			() -> System.out.println("1.Done"));

	publish.connect();

	publish.doFinally(s -> latch.countDown())
	       .subscribe(r -> System.out.println("2.Next: " + r),
			       e -> System.out.println("2.Error " + e),
			       () -> System.out.println("2.Done"));

	latch.await();
}

And the output:

1.Next: 1
1.Done

It also does not finish. That is because, in my view, late-arriving Subscriber has no defined contract in case of publish().
Please consider using replay() to achieve the desired result. Please consult our reference documentation for some examples.

With the above, I'm closing this issue. For further questions around such behaviour, StackOverflow might be a good place. For instance, I found a similar question with a reasonable answer: https://stackoverflow.com/questions/59237587/reactor-publish-behavior

I'd expect the 2nd subscriber to get the complete signal. If the published flux had elements, the late subscriber should not see them and only get the complete signal.

This is what I understand from the marble diagram in the JavaDoc of Flux.publish(). The link you provided does not say anything about late subscribers, AFAICT. Consider updating the docs.
Lastly, the stackoverflow answer does exactly what I described above, and does not discuss hanging.

In any case, it seems a very weird desired behavior to have pipelines hang for this possibly subtle issue. Seems to me that even throwing an exception will be preferred. In particular, the published flux accepted the subscription and then ignored it. It is certainly a surprise to see your pipeline hang with the right set of inputs and it should be a design goal to reduce API usage surprises.

(Regarding replay() thank you for your suggestion. Even though the original pipeline had a significant flaw, not related to this issue, both publish() and replay() / cache() are not appropriate. The real code is being change to do something else.)

Thanks for providing more feedback @RonBarkan. I did a bit of digging in the history and found #2897 - it feels it's the same subject, so I marked this as duplicate. It also contains the suggestion of using replay(history=0) to overcome the limitations of publish. After giving it a second thought, it does feel unexpected to have a hanging process and coordinating anything following a completion to avoid the late subscription feels undesired. I can't tell much about the priorities and timeline around this, as there are other ongoing efforts, but a contribution is always welcome. At the same time, having a workaround takes away the pressure I suppose.