Stream should end when _.nil is returned from map

Question

Stream should end when _.nil is returned from map

podemosaprender opened this issue 5 years ago · comments

Returning _.nil from map should end the stream

to be consistent with the API and other stream libraries
because it's the easier way to end a stream from other source without breaking encapsulation (e.g. from an infnite, network, other complex stream)

_([0, 1, 2, 3, 4, 5])
.map(function (e) { return e < 4 ? e : _.nil; })
.toArray(function (xs) {
test.same(xs, [0, 1, 2, 3]);
});

Related to #513 and perhaps #172 .
Solved, sending pull request.

Victor Vu · Answer 1 · Wed Mar 20 2019 11:39:14 GMT+0800 (China Standard Time)

I'm not sure I agree with this yet.

What part of the API is inconsistent with the current map behavior? nil only has meaning in the consume and pull APIs, and those are low level APIs. map is a higher-level API, so there's no reason why it must understand _.nil.

What other stream libraries allows you to end a stream with the map operator? I'm only familiar with RxJS, and you definitely can't do that there.

What do you mean by "breaking encapsulation"? You can end a stream using consume. Is that the problem? You don't want to drop down to a low-level API like consume?

Would your use case be served if we added a takeWhile operator? Like the one defined in #378? I'd much rather add that than introduce the concept of nil to the map operator.

jaide (formerly eccentric-j) · Answer 2 · Wed Mar 20 2019 12:04:07 GMT+0800 (China Standard Time)

Ended up using that takeWhile implementation the other day. I agree with @vqvu on this.

MauricioCap · Answer 3 · Fri Mar 22 2019 10:33:10 GMT+0800 (China Standard Time)

Sure! We can add something like _(...).takeWhile( e => e.x != "sentinel" )
it's neat, high level, and it'll work the same without exposing nil.

I chose allowing map to end the stream returning nil because I intend to use the library for teaching and working with beginners and I'm thus trying to keep to the minimum the number of methods and concepts they need to know. We already met nil using fetch with map and will also need to return some sentinel value like nil from the mapping function to signal takeWhile that we are done. This also seems to be the case for #513 . In the same line, for Node streams "Passing chunk as null signals the end of the stream (EOF), after which no more data can be written.".

Finally, being my first PR and being a new user of the library, I tried to keep my changes small and safe.

But I'm ok with any of the options and will be glad to follow your guidance. Please let me know how can I be more helpful.

Victor Vu · Answer 4 · Fri Mar 22 2019 11:17:47 GMT+0800 (China Standard Time)

Thanks for the response. Node stream's push is a low-level API mostly meant to be used by Readable stream implementers, so it's more like pull and consume than map.

If you'd like to put together the PR, I'm happy to accept a implementation of takeWhile.

If you're using the library for teaching, I'd suggest that you structure your example so that you can run takeWhile before map. For example, stream.takeWhile((x) => x < 5).map((x) => x * x). I think it would still get the point across, and doesn't require a special sentinel whose only purpose is to coordinate between takeWhile and map.

If you can't make the decision without running the mapper then, depending on the reason why you want to stop the stream, it may be better to use stopOnError instead.

If you're "done" because the mapper cannot process the input, then maybe you can throw an error and use stopOnError.

stream.map((x) => {
  // something
  if (cannotHandle) {
    throw new Error('blah')
  }
}).stopOnError()

If you're "done" because the map has produced a value that you no longer care about, then run takeWhile on the mapped value rather than having map return a sentinel.

stream.map((x) => x * 2)
  .takeWhile((x) => x < 20);

Perhaps your example is such that it is natural for map to make the decision to end the stream. But in my experience, that's very rare, so I would try to avoid introducing it to beginners as if it were a common thing to do.

Also, if by fetch you mean the Fetch API, then it returns a Promise that you can immediately convert into a Highland stream. No nil required. For example, _(fetch('https://github.com'))

MauricioCap · Answer 5 · Fri Mar 22 2019 22:10:24 GMT+0800 (China Standard Time)

Great! While I try to spare some time to work in a takeWhile implementation during this weekend, and so I work in a frequent practical use case where I think using the library is very attractive in a way consistent with your vision, what would be the most high level pattern to get e.g.:

a stream of integers (as many as consumed)
mapped by a curried sprintf to a url
mapped by a FetchA API fetch to the promise returned by res.text()
mapped to lines

Our first approach was

We wrote a generator for 1, where we met nil and the low level API
(the docs made it was easy and interesting)
The synchronous steps were easy and we ended up with something like
txt_st= url_st.map(fetch_st_map).series();
(we need to fetch one file at a time due to device memory limitations)
where reading the documentation we wrote
```
 return fetch(url).then((res) => {
             done= !res.ok ; //A: we stop at first error, e.g. 404
             return (res.ok ? res.text() :  _.nil);
         });
  }```
```

But then we run into some trouble getting the stream to end and ended up (not very proud of our convoluted code) with
txt_st= url_st.map( mk_fetch_st_map() ).series();
where

var done= false; //XXX: highland keeps calling after nil
return function fetch_st_map(url) {
 return done ? _.nil : _(
     fetch(url).then((res) => {
         done= !res.ok ; //A: we stop at first error, e.g. 404
         return (res.ok ? res.text() :  _.nil);
       })
}´´´
and the changes commited in the PR

Would stopOnError work in this case? I see it'd be clearly more readable, and I'm worried I may be biased by other (older/less frequent) programming languages but I would like to contribute in a way consistent with what the users of the library if I can.

Victor Vu · Answer 6 · Tue Mar 26 2019 12:51:31 GMT+0800 (China Standard Time)

Thanks for the explanation. Here's how I would do it:

function generator() {
  let i = 0;
  return _((push, next) =>
    // No nil, because of the "as many as consumed" requirement.
    // One of the unique parts about Highland is that it is strongly lazy, so this
    // seemingly infinite stream still works.
    push(null, i++);
    next();
  });
}

// Step 1
generator()
  // Step 2.
  // Should not be any surprise.
  .map((i) => sprintf(...))
  // Step 3a.
  // Rather than attempt to look at the result and stop the stream, we get the data out
  // of Promises and into the stream as soon as possible. flatMap(...) is equivalent to
  // map(...).series().
  .flatMap(fetch)
  // Step 3b.
  // Now you have a stream of res, use takeWhile to stop the stream when res.ok is false.
  .takeWhile((res) => res.ok)
  // Step 3c.
  // Now you have a stream of usable responses, so go ahead and extract the text.
  .map((res) => res.text())
  // Step 4
  // Map to lines. We use flatMap since there can be a multiple lines.
  .flatMap((text) => _(text.split('\n'))

I would usually combine consecutive maps (i.e., steps 3c and 4) into a single map unless the function for one of the maps is already implemented as a separate entity (i.e., not an arrow function) or it is strongly independent of the others. It's a style question, and is basically a judgement call if you think the above is better than.

  .takeWhile(...)
  .flatMap((res) => _(res.text().split('\n')))

If you present step 3 as including extracting the text, then keep them separate. Otherwise, combine the res.text() step and the split into lines step into step 4 so you can use a single flatMap.

MauricioCap · Answer 7 · Wed Mar 27 2019 08:24:55 GMT+0800 (China Standard Time)

Thanks for the detailed answer!