caolan / highland

High-level streams library for Node.js and the browser

Home Page:https://caolan.github.io/highland

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: mergeWithLimit inside _.pipeline ?

ak--47 opened this issue · comments

commented

Hi!

I have a highland pipeline like this:

function batchAndQueue(stream) {
    const asyncPost = _.wrapCallback(callbackify(flush));

    const data = _(stream)
        .map((data) => someTransform(data))
        .batch(2000)
        .consume(batchForByteSize(10000))
        .map((batch) => asyncPost(batch))
        .mergeWithLimit(numWorkers)
        .doto(() => { progressBar() })
        .errors((e) => { throw e; });

    return Promise.all(await data.collect().toPromise(Promise));
}

(this code data from a stream, sends it over the network, returns a promise so the whole thing can be awaited)

this works great, and mergeWithLimit seems to be giving me the concurrency I was looking for.

I'm trying to translate this exact flow to a _.pipeline() interface, so I can expose a proper Readable Stream for my API.

this is as far as I got:

function exposePipeline(finish = () => {}) {
    const asyncPost = _.wrapCallback(callbackify(flush));

    const pipeToMe = _.pipeline(
        _.map((data) => someTransform(data)),
        _.batch(N),
        _.consume(customSizeBatch),
        _.map((batch) => asyncPost(batch))

        // flip promise back to stream
        _.flatMap(_),
        
        _.doto(() => { progressBar()}),
        _.errors((e) => { throw e;})
    )

    // * handlers
    pipeToMe.on('end', () => {
        finish(null, summary());
    });

    pipeToMe.on('pipe', () => {
        pipeToMe.resume();
    });

    return pipeToMe;
}

but i can't figure out how to use mergeWithLimit (or parallel) in this pipeline

it doesn't appear that _.mergeWithLimit or _.pipeline are top level functions.... like _.map, _.consume, _.batch, etc...

   _.map((batch) => asyncPost(batch)),
   _.mergeWithLimit(workers),

doesn't work... have tried a bunch of other variants too... so i think my mental model is wrong.

can someone point me in the right direction? much appreciated! 🙏

commented

turns out i was being dense... _.mergeWithLimit() totally is a top level function, but it doesn't work nicely with _.flatMap() for this use case since it merge expects a stream of streams and flatMap doesn't produce that...

anyway final pipeline is:

function exposePipeline(finish = () => {}) {
    const asyncPost = _.wrapCallback(callbackify(flush));

    const pipeToMe = _.pipeline(
        _.map((data) => someTransform(data)),
        _.batch(N),
        _.consume(customSizeBatch),
        _.map((batch) => asyncPost(batch))
        _.mergeWithLimit(numWorkers),        
        _.doto(() => { progressBar()}),
        _.errors((e) => { throw e;})
    )

    // * handlers
    pipeToMe.on('end', () => {
        finish(null, summary());
    });

    pipeToMe.on('pipe', () => {
        pipeToMe.resume();
    });

    return pipeToMe;
}

hope it helps someone else!