sourcegraph / conc

Better structured concurrency for go

Home Page:https://about.sourcegraph.com/blog/building-conc-better-structured-concurrency-for-go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: Ordered ResultPool for function call order preservation

nolotz opened this issue · comments

Hello,

First, I would like to express my gratitude for your hard work on the conc library. The structured concurrency it brings to Go makes complex tasks a lot more manageable.

I am writing to propose a new feature that would further enrich the functionality of the library. The idea is to create a new type of entity that combines the functionalities of ResultPool and Stream. The main goal of this entity would be to provide a concurrent task runner that not only collects task results but also maintains the order of the calls to the functions. In simpler terms, it would give you a slice ordered by the call order of functions.

Currently, the ResultPool is great for running tasks concurrently and collecting the results, but it doesn't necessarily maintain the order of the functions calls. On the other hand, the Stream entity allows for processing an ordered stream of tasks in parallel but does not collect the results.

The proposed entity could be very useful in situations where you want to run tasks concurrently, collect their results, and also preserve the order of the tasks.

Please, let me know what you think about this proposal. I am also open to contributing towards the development of this feature if that would be acceptable.

Thank you for your time and consideration.

Solves #110

Hi @nolotz! Have you taken a look at the iter package? In particular, iter.Map(). Because it knows the size of the set of results in advance, it can pre-allocate a slice and return results in the same order as the input set of tasks. Does that work for your use case?

Hi @camdencheek,

Thank you for your prompt response and suggestion. I indeed took a look at the iter package, specifically the iter.Map(). It is a powerful tool that effectively returns results in the same order as the input set of tasks.

However, the use case I am envisioning requires a blend of ResultPool and Stream functionalities. This would offer not just the ordered return of results but also allow concurrent execution of the tasks along with handling potential errors and context cancellations, similar to what ResultPool and Stream provide individually.

The proposal for a ResultStream entity is to cover scenarios where maintaining the order of tasks execution, concurrent processing, and result collection are all important, offering more flexibility and control in handling complex concurrency requirements.

I hope this provides more clarity on the proposed feature. I look forward to hearing your thoughts on this.

but also allow concurrent execution of the tasks along with handling potential errors and context cancellations

Just to make sure we're on the same page, iter.Map() also executes its tasks concurrently, and there is a variant iter.MapErr() that will handle errors (though errors won't cancel the context (yet)).

One idea I'm toying around with is making ResultPool always maintain result order. It wouldn't be super expensive (it would just add some complexity), but that would make the abstraction much more useful for a lot of cases. Would that work for your use case? The only thing different between ResultPool compared to the hypothetical ResultStream would be that you don't get your ordered set of (ordered) results until you call Wait(), whereas with ResultStream, you could start operating on the stream results before everything is finished.

Just to add an additional use case: an ordered pool would be great in the following example:

	for _, searchPath := range f.Paths {
		searchPath := searchPath

		for _, searchName := range f.Names {
			searchName := searchName

			pool.Go(func() ([]string, error) {
				return search(fsys, searchPath, searchName)
			})
		}
	}

I basically want to search in a file tree and return results in the order of paths and file names used for searching.

map.Iter does not seem to be optimal here due to the two-dimensional nature of the lists.

I considered introducing an intermediate structure to generate a single slice and call iter.Map on that (in fact, I may still do that), but that wouldn't be that elegant IMO.

One idea I'm toying around with is making ResultPool always maintain result order. It wouldn't be super expensive (it would just add some complexity), but that would make the abstraction much more useful for a lot of cases. Would that work for your use case? The only thing different between ResultPool compared to the hypothetical ResultStream would be that you don't get your ordered set of (ordered) results until you call Wait(), whereas with ResultStream, you could start operating on the stream results before everything is finished.

@camdencheek is this something you are still considering?