Emit multiple values from Pipeline stage
bear24rw opened this issue · comments
Is there an idiomatic pattern for emitting multiple values from a pipeline stage? For example in the DataPipeline
shown in the docs how could I emit multiple values from the second pipe?
If I use tf::Pipeline
instead I could have the second pipe add the items it created to a queue and have the 3rd pipe loop over that queue. But this breaks down in a different example where I emit multiple items from the second pipe but I want to process all those items in the third pipe concurrently. It seems like I'm looking for some kind of tf::PipeFlow::emit(item)
method which I could call multiple times in the second pipe and it would trigger multiple runs of the 3rd pipe.
This seems possibly similar to corun
except I don't want to specify a target, the target should be known automatically: its the next stage of the pipeline.
#include <taskflow/taskflow.hpp>
#include <taskflow/algorithm/data_pipeline.hpp>
int main() {
// data flow => void -> int -> std::string -> float -> void
tf::Taskflow taskflow("pipeline");
tf::Executor executor;
const size_t num_lines = 4;
// create a pipeline graph
tf::DataPipeline pl(num_lines,
tf::make_data_pipe<void, int>(tf::PipeType::SERIAL, [&](tf::Pipeflow& pf) -> int{
if(pf.token() == 5) {
pf.stop();
return 0;
}
else {
printf("first pipe returns %lu\n", pf.token());
return pf.token();
}
}),
tf::make_data_pipe<int, std::string>(tf::PipeType::SERIAL, [](int& input) {
printf("second pipe returns a strong of %d\n", input + 100);
return std::to_string(input + 100);
return std::to_string(input + 200); // <---------------- I want to emit both + 100 and + 200
}),
tf::make_data_pipe<std::string, void>(tf::PipeType::SERIAL, [](std::string& input) {
printf("third pipe receives the input string %s\n", input.c_str());
})
);
// build the pipeline graph using composition
taskflow.composed_of(pl).name("pipeline");
// dump the pipeline graph structure (with composition)
taskflow.dump(std::cout);
// run the pipeline
executor.run(taskflow).wait();
return 0;
}
In this case, we can use std::pair<std::string, std::string>
as the input to the third pipe. Does that make sense to you?
we can use
std::pair<std::string, std::string>
as the input to the third pipe
It makes sense but doesn't exactly solve what I'm looking for. I want to turn the 3rd pipe into a CONCURRENT
pipe such that the two (or more) items emitted from the 2nd pipe are processed in parallel.
My actual application has 1 pipe decoding a datastream, when its decoded there may by 0 to N decoded "packets" that it emits. I want to send all these packets to the next pipe to be processed in parallel.
It seems like the tf::Pipeline
setup assumes there is always 1 item in and 1 item out from each pipe. But I have 1 item in and N items out of my pipe.
This paper, talking about TBB, actually mentions my situation:
Single input, multiple outputs
The SplitBlocks stage takes a block and splits it into smaller segments;
it takes a single input token and produces multiple output tokens. However,
Filter objects in TBB can only take a single input and produce a single output.
To mimic the Pthreads version, we had to resort to nested pipelines as shown
in Figure 5(b).
It would be great if TaskFlow could solve this without having to do the nested pipelines workaround.