Questions surrounding implementing Producer/Consumer with this gem?

Question

Questions surrounding implementing Producer/Consumer with this gem?

JesusCarvalho opened this issue 4 years ago · comments

First off, thanks for all the fantastic work. Gems like this one embody the best of the Ruby community.

I have questions regarding how to successfully implement a Producer/Consumer pattern with this gem. My questions stem from lack of understanding about scope and boundaries of the concurrency mechanisms, essentially, how do I synthesize and compose Ruby core and std lib with this gem to acheive the following:

1] Thread-safe queue (holding URLs to hit)
2] Multiple consumers (ie: async-http workers) using the above data-structure
3] Thread-safe recording of results from response (in case redundant endpoints are in list)
4] A producer that refills the data structure from a text file
5] Coordination of the above in an idiomatic way (signaling between consumers and producer)

The program in question would fill a queue with URLs to hit which would then be consumed by a fixed number (thread pool?) of asynchronous worker tasks that would call the endpoint and record the response.

So far I've pieced together some candidates for acheiving each of the above

1] Thread-safe Queue
2] This gem (obviously)
3] YAMLStore
4] Producer MonitorMixin examples:
5] Coordination

Everytime I try to combine the above to achieve the stated goal, I make a mess of things. I'm not sure of the "separation of concerns" and boundaries for above pieces with respect to handling of concurrency. I'm also not sure if the above list of candidates is complete (ex: do I need a mutex?). Any thoughts?

Bonus Question:
Under "Multiple Requests" your documentation states:
To issue multiple requests concurrently, you should use a barrier
Based on my understanding of barriers I wonder why asynchronous requests have to wait on each other at all?

Many thanks in advance,
TJ

Samuel Williams · Answer 1 · Thu Jun 04 2020 08:15:23 GMT+0800 (China Standard Time)

Here is how to implement a spider using this gem: https://github.com/socketry/benchmark-http/blob/master/lib/benchmark/http/spider.rb

Do not mix threads and async code, it will not work correctly unless you know exactly how things are working.

Use async-container for parallelism (i.e. spin up several spiders and use an IO for coordination).

To issue multiple requests concurrently, you should use a barrier

A barrier allows you to coordinate when work is completed, that is all. If you issue multiple requests and want to wait until they are all finished, user a barrier.

I'm sure you will have more questions but hopefully this is a start.

JesusCarvalho · Answer 2 · Thu Jun 04 2020 20:58:08 GMT+0800 (China Standard Time)

Thanks for the quick turnaround Sam. You are right, I do have more questions, but I'll study up and come back after I'm versed in all the code you sent my way.

Samuel Williams · Answer 3 · Thu Jun 11 2020 22:35:06 GMT+0800 (China Standard Time)

You might enjoy this: https://github.com/socketry/async-container/blob/master/examples/queue/server.rb

Samuel Williams · Answer 4 · Sat Aug 13 2022 13:49:16 GMT+0800 (China Standard Time)

I'm going to close this issue. If you have further questions or feedback, please feel free to start a discussion: https://github.com/socketry/async-http/discussions

With Ruby 3.1 and Async 2.x, threads are now supported, as well as Thread::Queue and so on. It might provide some advantage in your use case but still requires some level of care.