caolan / highland

High-level streams library for Node.js and the browser

Home Page:https://caolan.github.io/highland

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stream should end when _.nil is returned from map

podemosaprender opened this issue · comments

Returning _.nil from map should end the stream

  • to be consistent with the API and other stream libraries
  • because it's the easier way to end a stream from other source without breaking encapsulation (e.g. from an infnite, network, other complex stream)

_([0, 1, 2, 3, 4, 5])
.map(function (e) { return e < 4 ? e : _.nil; })
.toArray(function (xs) {
test.same(xs, [0, 1, 2, 3]);
});

Related to #513 and perhaps #172 .
Solved, sending pull request.

I'm not sure I agree with this yet.

What part of the API is inconsistent with the current map behavior? nil only has meaning in the consume and pull APIs, and those are low level APIs. map is a higher-level API, so there's no reason why it must understand _.nil.

What other stream libraries allows you to end a stream with the map operator? I'm only familiar with RxJS, and you definitely can't do that there.

What do you mean by "breaking encapsulation"? You can end a stream using consume. Is that the problem? You don't want to drop down to a low-level API like consume?

Would your use case be served if we added a takeWhile operator? Like the one defined in #378? I'd much rather add that than introduce the concept of nil to the map operator.

Ended up using that takeWhile implementation the other day. I agree with @vqvu on this.

Sure! We can add something like _(...).takeWhile( e => e.x != "sentinel" )
it's neat, high level, and it'll work the same without exposing nil.

I chose allowing map to end the stream returning nil because I intend to use the library for teaching and working with beginners and I'm thus trying to keep to the minimum the number of methods and concepts they need to know. We already met nil using fetch with map and will also need to return some sentinel value like nil from the mapping function to signal takeWhile that we are done. This also seems to be the case for #513 . In the same line, for Node streams "Passing chunk as null signals the end of the stream (EOF), after which no more data can be written.".

Finally, being my first PR and being a new user of the library, I tried to keep my changes small and safe.

But I'm ok with any of the options and will be glad to follow your guidance. Please let me know how can I be more helpful.

Thanks for the response. Node stream's push is a low-level API mostly meant to be used by Readable stream implementers, so it's more like pull and consume than map.

If you'd like to put together the PR, I'm happy to accept a implementation of takeWhile.


If you're using the library for teaching, I'd suggest that you structure your example so that you can run takeWhile before map. For example, stream.takeWhile((x) => x < 5).map((x) => x * x). I think it would still get the point across, and doesn't require a special sentinel whose only purpose is to coordinate between takeWhile and map.

If you can't make the decision without running the mapper then, depending on the reason why you want to stop the stream, it may be better to use stopOnError instead.

If you're "done" because the mapper cannot process the input, then maybe you can throw an error and use stopOnError.

stream.map((x) => {
  // something
  if (cannotHandle) {
    throw new Error('blah')
  }
}).stopOnError()

If you're "done" because the map has produced a value that you no longer care about, then run takeWhile on the mapped value rather than having map return a sentinel.

stream.map((x) => x * 2)
  .takeWhile((x) => x < 20);

Perhaps your example is such that it is natural for map to make the decision to end the stream. But in my experience, that's very rare, so I would try to avoid introducing it to beginners as if it were a common thing to do.


Also, if by fetch you mean the Fetch API, then it returns a Promise that you can immediately convert into a Highland stream. No nil required. For example, _(fetch('https://github.com'))

Great! While I try to spare some time to work in a takeWhile implementation during this weekend, and so I work in a frequent practical use case where I think using the library is very attractive in a way consistent with your vision, what would be the most high level pattern to get e.g.:

  1. a stream of integers (as many as consumed)
  2. mapped by a curried sprintf to a url
  3. mapped by a FetchA API fetch to the promise returned by res.text()
  4. mapped to lines

Our first approach was

  • We wrote a generator for 1, where we met nil and the low level API
    (the docs made it was easy and interesting)
  • The synchronous steps were easy and we ended up with something like
    txt_st= url_st.map(fetch_st_map).series();
    (we need to fetch one file at a time due to device memory limitations)
    where reading the documentation we wrote
     return fetch(url).then((res) => {
                 done= !res.ok ; //A: we stop at first error, e.g. 404
                 return (res.ok ? res.text() :  _.nil);
             });
      }```
    
  • But then we run into some trouble getting the stream to end and ended up (not very proud of our convoluted code) with
    txt_st= url_st.map( mk_fetch_st_map() ).series();
    where
    var done= false; //XXX: highland keeps calling after nil
    return function fetch_st_map(url) {
     return done ? _.nil : _(
         fetch(url).then((res) => {
             done= !res.ok ; //A: we stop at first error, e.g. 404
             return (res.ok ? res.text() :  _.nil);
           })
    }´´´
    and the changes commited in the PR 
    
  • Would stopOnError work in this case? I see it'd be clearly more readable, and I'm worried I may be biased by other (older/less frequent) programming languages but I would like to contribute in a way consistent with what the users of the library if I can.

Thanks for the explanation. Here's how I would do it:

function generator() {
  let i = 0;
  return _((push, next) =>
    // No nil, because of the "as many as consumed" requirement.
    // One of the unique parts about Highland is that it is strongly lazy, so this
    // seemingly infinite stream still works.
    push(null, i++);
    next();
  });
}

// Step 1
generator()
  // Step 2.
  // Should not be any surprise.
  .map((i) => sprintf(...))
  // Step 3a.
  // Rather than attempt to look at the result and stop the stream, we get the data out
  // of Promises and into the stream as soon as possible. flatMap(...) is equivalent to
  // map(...).series().
  .flatMap(fetch)
  // Step 3b.
  // Now you have a stream of res, use takeWhile to stop the stream when res.ok is false.
  .takeWhile((res) => res.ok)
  // Step 3c.
  // Now you have a stream of usable responses, so go ahead and extract the text.
  .map((res) => res.text())
  // Step 4
  // Map to lines. We use flatMap since there can be a multiple lines.
  .flatMap((text) => _(text.split('\n'))

I would usually combine consecutive maps (i.e., steps 3c and 4) into a single map unless the function for one of the maps is already implemented as a separate entity (i.e., not an arrow function) or it is strongly independent of the others. It's a style question, and is basically a judgement call if you think the above is better than.

  .takeWhile(...)
  .flatMap((res) => _(res.text().split('\n')))

If you present step 3 as including extracting the text, then keep them separate. Otherwise, combine the res.text() step and the split into lines step into step 4 so you can use a single flatMap.

Thanks for the detailed answer!