max0x7ba / atomic_queue

C++ lockless queue.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unbounded queue possible with atomic_queue?

benstadin opened this issue · comments

I'm looking to replace folly::UnboundedQueue. Is it possible to orchestrate multiple atomic_queues to basically grow without a fixed size limit? Where e.g. the last item in the queue point to a new fixed sized queue and is thrown away when poppen and created dynamically when pushed at the last index in the queue?

Something is only impossible until it's done.

What is the high-level problem you'd like to solve?

I have an MPMC problem where a producer thread is also consumer thread. These threads must be guaranteed to succeed pushing an item to the queue. Otherwise there can be situations where many threads push items to the queue, wait forever, and the other producer/consumer threads do not (yet) pop any item.
Folly has an UnboundedQueue which I'm happy with performance wise. Though maintaining this (whole) library is abysmal. Memory usage is not great with folly::UnboundedQueue either, but this is less a concern.

I have an MPMC problem where a producer thread is also consumer thread.

Do your producer threads do both push and pop?

These threads must be guaranteed to succeed pushing an item to the queue.

If your producer threads are also consumers, failing to push an item into a queue, can they not pretend they just popped it and act like a consumer?

Yes, some (not all) do indeed push and pull. I think your design considerations are valid, but do not apply for all use cases.

I'm using multiple queues for a some kind of message broker to decouple components. These components intentionally do not know about each other and wait for messages most of the time.

In this broker-like scenario, using a fixed-size queue can lead to some situations where a cycle is built. For example, component A popped some data, processed it and wants to push the processed data onto the queue but and waits (queue full). At this same time component B - which is registered at the message broker for the kind of data which component A produces - may also be trying to push data onto the queue and waits. When there is nobody else coming to rescue and pops some data from the queue the two components will wait forever and never reach the state where the pop data from the queue.

A component can't ignore the failing push, because the data it tries to push is in most cases targeting another component and is unable to process it on it's own. To make tings worse, the items must stay in strict order.

One thing I've though about is to atomic counter for the case when the queue is full and pushing fails. The first one who tried to push on the full queue is responsible to create another overflow queue, all others spinloop and wait. Though cmobining these queues is still not trivial, I guess.

With an unbounded queue you might eventually run out of memory. With a bounded queue you must set the maximum unconsumed queue size upfront and fail when that limit is reached.

Not clear why you cannot set the maximum queue size upfront.

Running out of memory is not a concern, but overall resource consumption is. My initial attempt was indeed to use large fixed sized queues. But for either internal or external reasons, at one point in time whatever assumption I do about the maximum size this is exceeded. But having too large queues all of the time is not an option either.

In concrete numbers: With an unbounded queue the service runs at idle with ~50MB memory, but may grow to several gigabtyes for a brief period, like a few minutes per month, then goes back to like 200 MB - 300 MB (the folly unbounded queue can indeed shrink, but not predictably and not back to the initial state, and there is no way to trigger a cleanup). Having the same service running with several GB of memory at all times is not acceptable to users.

I was hoping to combine an array of fixed sized queues and deallocate them when unused, so that the application would go back to it's initial 50 MB. I take measures at other points in this app to release memory back to the OS, though the queue is a major headache in this regards.

Sounds like an unbounded queue solves your problem best.

In the benchmarks, there are excellent unbounded queues from xenium, intel and boost libraries.

Why wouldn't you use an existing unbounded queue, other than folly?

I‘ve tried all of the xenium queues and several others. As far as I understand current unbounded queue implementations cannot make any guarantees that they deallocate memory predictably (related to the size of the queue).

It is possible to release the queue storage to the OS using madvise(..., MADV_DONTNEED) call on it when the following conditions are satisfied:

  1. NIL=0 or non-atomic default-initialized elements have 0 bit pattern.
  2. MINIMIZE_CONTENTION=false
  3. The queue counters are reset to 0 when the queue becomes empty.

That takes advantage of OS demand paging, which allocates page frames incrementally only for actually accessed parts of virtual address space.

Current constructors initialize the storage unconditionally, which precludes incremental OS demand paging.

I‘ve tried all of the xenium queues and several others. As far as I understand current unbounded queue implementations cannot make any guarantees that they deallocate memory predictably (related to the size of the queue).

This is the question you need to get full understanding and clarity on.

Exactly how and why existing unbounded queues do not solve your problem.

Running out of memory is not a concern, but overall resource consumption is. My initial attempt was indeed to use large fixed sized queues. But for either internal or external reasons, at one point in time whatever assumption I do about the maximum size this is exceeded. But having too large queues all of the time is not an option either.

You should, at the very least, collect exact empirical numbers of your queue maximal sizes in byte and object count units.

Ideally, you should collect queue sizes in byte and object count units every time they change, along with a timestamp. These metric timeseries allow you to plot timelines and histograms of your queue sizes and memory usage. Timeline and histogram charts scream new insights and ideas at you at first glance every time you take an effort to collect and visualize the metrics, in my experience.

Without these metrics you lack understanding of the nature of your problem, and, hence, are unable to do capacity planning and predict which queue would or wouldn't work for you. I may be totally wrong in my assessment, but from information you gave, it seems to me like you are looking for an easy magical solution for your problem, trying available existing unbounded and bounded MPMC queue libraries, and finding all of them being inadequate.

You can keep trying doing the same thing, but you cannot expect much different results from that, can you? It is high time you invested your time and effort into collecting your queue metrics and visualising them. If you want someone else help solving the problem you face, you must code up a simplest possible demo/example/benchmark which measures the metrics you care about, so that anyone else can reproduce your observations and improve on them. Without reproducible metrics your problem cannot be solved using the scientific method, in my opinion.

Thanks for your insights. I'll follow up on your advise.