Allow interleaved mapping in async iterators

Question

Allow interleaved mapping in async iterators

bakkot opened this issue a year ago · comments

This is reviving #128, basically.

It would be nice if code like

x = asyncIteratorOfUrls
  .map(u => fetch(u))

await Promise.all([
  x.next(),
  x.next(),
])

could perform the fetches in parallel. Right now, because async iterator helpers are essentially "implemented" as async generators, it can't - the second call to .next will be queued until the first one finishes, rather than immediately being forwarded to the underlying iterator.

If the implementation of map were different

like this

AsyncIteratorProto.map =
  function(fn) {
    return {
      __proto__: AsyncIteratorProto,
      next: async () => {
        let { done, value } = await this.next();
        if (done) return { done: true };
        return {
          done: false,
          value: await fn(value),
        };
      },
    };
  };

then the above snippet would just work. I think we should consider revising this.

It is less clear how, and whether, to allow parallelism in other helpers. I think they all have pretty natural semantics, but I have not yet worked through all of them in detail.

More speculatively, at a later date this would allow us to add a helper (say .bufferAhead(N)) to eagerly pump an async iterator and buffer the results. That would let you make any async iterator parallel with bounded concurrency, assuming the iterator was capable of supporting parallelism (so e.g. .map applied to the result of an async generator, but not an async generator itself), without changing the ordering semantics of the result.

See slides here and some discussion here.

Bergi · Answer 1 · Thu Jan 26 2023 22:51:26 GMT+0800 (China Standard Time)

I'd also love if we could even change the semantics of async generators so that yield does no longer implicitly await but instead can be resumed as soon as .next() is called again…

Kevin Gibbons · Answer 2 · Fri Jan 27 2023 00:16:38 GMT+0800 (China Standard Time)

@bergus See some discussion of that here, though of course such a change would not be in scope for this proposal in particular.

I note that, with the change I'm proposing in this issue, you could get the same effect by doing yield { v: promise } inside the async generator and then doing .map(box => box.v) on the result of the async generator. Which is slightly silly, but does let you get the thing you want without web compat risk.

James Browning · Answer 3 · Fri Jan 27 2023 09:12:56 GMT+0800 (China Standard Time)

If the implementation of map were different

It's technically allowed by the protocol, but is it a problem that this design allows for a { done: false, value ... } to come AFTER a { done: true }? Currently all spec iterators ensure that { done: true } mean that all successive calls to .next() produce { done: true }.

Note that this would be a difference from the synchronous version:

const results = [
    { done: false, value: "A" },
    { done: true },
    { done: false, value: "B" },
];

class CustomSyncIterator extends Iterator {
    [Symbol.iterator]() { return this; }
    
    #index = 0;
    
    next() {
        return results[this.#index++] ?? { done: true };
    }
}

class CustomAsyncIterator extends Iterator {
    [Symbol.asyncIterator]() { return this; }
    
    #index = 0;
    
    async next() {
        return results[this.#index++] ?? { done: true };
    }
}

const syncIterator = new CustomSyncIterator().map(value => value.repeat(5));
const syncResults = [
    syncIterator.next(),
    syncIterator.next(),
    syncIterator.next(),
];

const asyncIterator = new CustomAsyncIterator().map(value => value.repeat(5));
const asyncResults = await Promise.all([
    asyncIterator.next(),
    asyncIterator.next(),
    asyncIterator.next(),
]);

console.log(syncResults); // [{ done: false, value: "AAAAA" }, { done: true }, { done: true }]
console.log(asyncResults); // [{ done: false, value: "AAAAA" }, { done: true }, { done: false, value: "BBBBB" }]

Kevin Gibbons · Answer 4 · Fri Jan 27 2023 09:20:43 GMT+0800 (China Standard Time)

but is it a problem that this design allows for a { done: false, value ... } to come AFTER a { done: true }?

To be clear I'm not saying that my sample code would be literally the implementation, just demonstrating how it's possible to get parallelism. (Note that the sample code creates a new next method each time - it's definitely not intended to be a high-fidelity implementation.) We would almost certainly want to keep a bit indicating whether the iterator has been closed, at the very least.

Conrad Buck · Answer 5 · Tue Feb 07 2023 00:03:24 GMT+0800 (China Standard Time)

Yeah, this is all down the the level of the async iterator protocol. My initial impression is also that you won't be able to do this, because according to the async iterator protocol the only way to find out if an iterable is done is to await its next value.

Kevin Gibbons · Answer 6 · Tue Feb 07 2023 00:31:26 GMT+0800 (China Standard Time)

My implementation there isn't meant to be complete, but it does support concurrency if you call .next multiple times. Here is a snippet you can run today which demonstrates that the map callback can run concurrently with itself and with the underlying async generator.

Conrad Buck · Answer 7 · Tue Feb 07 2023 00:44:48 GMT+0800 (China Standard Time)

Yeah you're right. I withdraw that criticism.

Kevin Gibbons · Answer 8 · Tue Feb 07 2023 04:20:42 GMT+0800 (China Standard Time)

Closing this issue as "we expect to do this and need to work out the details". For discussion of the details, follow along and contribute at https://github.com/tc39/proposal-async-iterator-helpers.