amphp / pipeline

Concurrent iterators and pipeline operations.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance Issues

withinboredom opened this issue · comments

I'm attempting to work on an async client to nats-php and note that the performance is pretty bad (basis-company/nats.php#62) and I've narrowed it down to the implementation of Queue and Pipelines. In fact, the more steps? is on a pipeline, the worse performance it appears to get:

Before adding ->concurrent()

image

After:

image

Without pipeline:

image

(async-background is a pipeline driven flow)

Should this be expected or is there some way to improve the performance?

Queue implements back-pressure, pausing the current fiber when calling Queue::push() until the current value is consumed. This can introduce some latency and might be leading to the poor performance numbers.

You can avoid this in two ways:

  • Allow Queue to buffer some number of values before Queue::push() waits for a value to be consumed before continuing. This size is passed to the Queue constructor.
  • Use Queue::pushAsync(), which returns a Future which is resolved when the value is consumed, but does not suspend the current fiber.

If you don't care about when the value is consumed, consider the first option, passing some very large integer. This will avoid even creating the Future because there's nothing to await.

Give those a try and let me know if you have any luck.

I tried (1) but now that I think about it, that's probably the issue since (IIRC, about to hop on a plane), each operation on the pipeline uses its own queue which creates back pressure. Perhaps that's what I'm seeing?

@trowski that is indeed the issue, but it's out of my hands. Each step in the pipeline creates its own queue with a default of 0 buffer. This is a fairly significant performance impact when there is a lot of data in the pipeline.

I pushed a branch, buffer, which adds Pipeline::buffer(int $bufferSize). Please use that branch as your dependency and call that method immediately after creating the pipeline with some various buffer sizes. Let me know what effect that has on performance.

This has now been released in 1.2.0.

But @withinboredom never confirmed if the new method was actually useful?

Ah, it's useful @Bilge! I just never came back with perf improvements because I'm in the middle of a huge refactor. FWIW, it is magnitudes faster, I just don't have a way to get hard numbers.