Question: mergeWithLimit inside _.pipeline ?
ak--47 opened this issue · comments
Hi!
I have a highland pipeline like this:
function batchAndQueue(stream) {
const asyncPost = _.wrapCallback(callbackify(flush));
const data = _(stream)
.map((data) => someTransform(data))
.batch(2000)
.consume(batchForByteSize(10000))
.map((batch) => asyncPost(batch))
.mergeWithLimit(numWorkers)
.doto(() => { progressBar() })
.errors((e) => { throw e; });
return Promise.all(await data.collect().toPromise(Promise));
}
(this code data from a stream, sends it over the network, returns a promise so the whole thing can be await
ed)
this works great, and mergeWithLimit
seems to be giving me the concurrency I was looking for.
I'm trying to translate this exact flow to a _.pipeline()
interface, so I can expose a proper Readable Stream for my API.
this is as far as I got:
function exposePipeline(finish = () => {}) {
const asyncPost = _.wrapCallback(callbackify(flush));
const pipeToMe = _.pipeline(
_.map((data) => someTransform(data)),
_.batch(N),
_.consume(customSizeBatch),
_.map((batch) => asyncPost(batch))
// flip promise back to stream
_.flatMap(_),
_.doto(() => { progressBar()}),
_.errors((e) => { throw e;})
)
// * handlers
pipeToMe.on('end', () => {
finish(null, summary());
});
pipeToMe.on('pipe', () => {
pipeToMe.resume();
});
return pipeToMe;
}
but i can't figure out how to use mergeWithLimit
(or parallel
) in this pipeline
it doesn't appear that _.mergeWithLimit
or _.pipeline
are top level functions.... like _.map
, _.consume
, _.batch
, etc...
_.map((batch) => asyncPost(batch)),
_.mergeWithLimit(workers),
doesn't work... have tried a bunch of other variants too... so i think my mental model is wrong.
can someone point me in the right direction? much appreciated! 🙏
turns out i was being dense... _.mergeWithLimit()
totally is a top level function, but it doesn't work nicely with _.flatMap()
for this use case since it merge expects a stream of streams and flatMap doesn't produce that...
anyway final pipeline is:
function exposePipeline(finish = () => {}) {
const asyncPost = _.wrapCallback(callbackify(flush));
const pipeToMe = _.pipeline(
_.map((data) => someTransform(data)),
_.batch(N),
_.consume(customSizeBatch),
_.map((batch) => asyncPost(batch))
_.mergeWithLimit(numWorkers),
_.doto(() => { progressBar()}),
_.errors((e) => { throw e;})
)
// * handlers
pipeToMe.on('end', () => {
finish(null, summary());
});
pipeToMe.on('pipe', () => {
pipeToMe.resume();
});
return pipeToMe;
}
hope it helps someone else!