tc39 / proposal-iterator.range

A proposal for ECMAScript to add a built-in Iterator.range()

Home Page:https://tc39.es/proposal-iterator.range/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do we need a dead loop detect?

Jack-Works opened this issue · comments

1e-323 === 0 equals false.

So it can bypass step 13:

If step is 0 or 0n, throws an exception.

But 42 + 1e-323 === 42 equals true

So it will cause a dead loop.

Should we prevent it?

Number.range(42, 100, 1e-323)

you must have not try this:

var step = 1e-323
for (let i = 0; i <= 1; i +=step) {
  // some magic
}

this api should behave like the others

Since this function use yield. There is nothing "dead loop" here indeed.

Just a fast try, Octave refuse to generate an integer vector of more than 264 items, or a floating point vector of more than 263 items with colon. Since JavaScript only allow double as Number type, we may reduce this limitation to Number.MAX_SAFE_INTEGER or 232 (for Array safe). But this would also mean that we cannot support Number.range(1, Infinity) if we add such limitation.

So, maybe we should have a discussion about the limitation of how many items may generated. Candidates may be

  • 232 (This is the length limitation of JavaScript Array)
  • 253 (This is the max integer value may be handled by IEEE754 64 bit floating point numbers, aka. JavaScript Number type)
  • Infinity (The original behavior)

And if the limitation is exceeded we should:

  • Throw at the beginning
  • Throw when try to generate the 2n-th element
  • Yield nothing
  • Force stop yield when limit reached

I personally prefer 232 (should be enough in most cases), and throw at the beginning.
And yield integers from 0 to Number.MAX_SAFE_INTEGER and throw after that by another API (should not be include in this proposal.)

Since this function use yield. There is nothing "dead loop" here indeed.

const count = Math.floor((end - start) / step);
for (i = 0; i < count; i++) yield start + i * step;

should be enough for me.

Consider this usage:

const numbers = [...Number.range(42, 43, 1e-323)]

We can check if last === now, but should we do this?

Consider this usage:

const numbers = [...Number.range(42, 43, 1e-323)]

I would prefer to limit count to at most 232 and this should throw at the beginning.

We can check if last === now, but should we do this?

In this case, count would just be 0, and nothing yield as expected.

Just like an infinite loop, if the programmer creates one, the language lets them. Numbers go up to MAX_SAFE_INTEGER and anything less wouldn’t make any sense for an iterator.

Oh there are the cases may leads to infinite loop

  • step is very small 1e-323 and make from + step === from
  • or current is very big 1.79769313486e308 and make current + 1 === current
  • ... any more?

We can see x + step === x is a clear pattern that shows a dead loop

@Jack-Works Use start + index * step would avoid this. Limit index in 0..MAX_SAFE_INTEGER would be enough.

Just like an infinite loop, if the programmer creates one, the language lets them. Numbers go up to MAX_SAFE_INTEGER and anything less wouldn’t make any sense for an iterator.

@ljharb So, should Number.range(0, Number.MAX_SAFE_INTEGER + 2) throw when invoke? Or should it throw when try to generate the MAX_SAFE_INTEGER-th number? Or should it try to generate MAX_SAFE_INTEGER infinity times?

Just like an infinite loop, if the programmer creates one, the language lets them. Numbers go up to MAX_SAFE_INTEGER and anything less wouldn’t make any sense for an iterator.

@ljharb So, should Number.range(0, Number.MAX_SAFE_INTEGER + 2) throw when invoke? Or should it throw when try to generate the MAX_SAFE_INTEGER-th number? Or should it try to generate MAX_SAFE_INTEGER infinity times?

No, MAX_SAFE_INTEGER is not the biggest number we can represent.

Number.MAX_SAFE_INTEGER === Number.MAX_SAFE_INTEGER + 2 === false

Biggest number is in the range 1e308 < x < 1e309(at least on my machine). 1e309 equals Infinity.

Sure, but it’s the biggest that can be consecutively represented - if you tried to range to max safe plus 2, the second to last value would be the same as the third to last value.

It’s really not a range of numbers, it’s a range of conceptual integers within the number type, when the step is 1.

Number.MAX_SAFE_INTEGER + 1.5 === Number.MAX_SAFE_INTEGER + 1

Yeah I got it

IMO all arguments of the Number.range method should pass a Number.isSafeInteger check. If any argument doesn't pass this check, Number.range should throw a RangeError

@chicoxyzzy that kind of value-dependent error is something that the committee unfortunately decided to explicitly avoid in the BigInt proposal; i suspect that it wouldn't be accepted here.

IMO all arguments of the Number.range method should pass a Number.isSafeInteger check. If any argument doesn't pass this check, Number.range should throw a RangeError

I think there is no need to do a check like isSafeInteger. If doing so, I prefer to remove the Number.range and encourage to use BigInt.range(also included in this proposal) and BigInt will enforce the semantic of it.
Number.range should be able to range over a fractional number (range(0.2, 0.8, 0.1)) at least I think...

Possible behaviors of Number.range(9007199254740991, 9007199254740996):

A. for (x = start; x < end; x += step)

9007199254740991
9007199254740992
9007199254740992
9007199254740992
9007199254740992
... dead loop

B. for (x = start; x < end; x += step) { if (!Number.isSafeInteger(x)) throw Error(); ...}

9007199254740991
throw

C. for (x = start; x < end; ) { ...; next = x + step; if (next === x) throw Error(); else x = next }

9007199254740991
9007199254740992
throw

D. for (x = start; x < end; ) { ...; next = x + step; if (next === x) x=nextNumber(x, start < end); else x = next }

9007199254740991
9007199254740992
9007199254740994

E. for (i = 0, x = start; x < end; ++i, x = start + step * i)

9007199254740991
9007199254740992
9007199254740992
9007199254740994

F. for (i = 0, n = (end - start) / step, x = start; i < n; ++i, x = start + step * i)

9007199254740991
9007199254740992
9007199254740992
9007199254740994
9007199254740996

G. check isSafeInteger for start/end
throw

@hax Number.isSafeInteger(0.2) === false so this check will disable range on non-integers

and also need to consider numbers that very small e.g. (1e-324 + 1e-324) === (1e-324)

Yeah, if use isSafeInteger check, it will actually disallow non-integers.

I just try to list all possible behaviors.

One approach that we should list for posterity is using the next higher/lower Number value when the value produced is the same as the previous one, to ensure progress continues to be made toward the end of the range.

I would not want to bother with any special magic to try to prevent infinite loops or avoid rounding. This should have the straightforward semantics of repeatedly adding the step to the initial value until you reach or exceed the final value. Do not add integer or safe integer checks; they just make the proposal more complicated because you'd then have to explain to users why they can do Number.Range(0, 10, 3) but not Number.Range(0, 10, 0.5).

I would not want to bother with any special magic to try to prevent infinite loops or avoid rounding. This should have the straightforward semantics of repeatedly adding the step to the initial value until you reach or exceed the final value.

Now it is using next = count * step semantics instead of last += step to avoid rounding problem.

One approach that we should list for posterity is using the next higher/lower Number value when the value produced is the same as the previous one

@michaelficarra I added it as option D in #7 (comment).

I agree that letting this situation infinite-loop is fine. It's very hard to produce in the first place, and playing around at the very limits of double precision is always fraught.

It's very hard to produce in the first place

Maybe it's hard to produce in the first place, the problem is there is no way to avoid it without manually checking every time. And if it was occurred, it may be very hard for the programmers to understand/debug/locate the reason.

To be honest, I see many similar saying ("this is edge case") in many arguments of various proposals, and I think we'd better care about what "edge case" mean. If the "edge case" is something programmers rarely use in that way, it's an "edge case" we can ignore, but if the "edge case" is just the normal usage of a feature but have small probability to occur, it's actually a very bad thing for programmers. Because programmers will hard to know/understand/debug it (because it's an edge case!) and even they know the "edge cases" they don't want to write code to avoid it because the cost (for example manually checking) may be not worthy (because it's an edge case!)

In this case, 0 as the step is already an invalid value, 1e-323 is the "is something programmers rarely use in that way".
Range from 0 to Infinity is a feature, but I don't expect the range really count to "infinity" because the memory is not infinitely big (for BigInt). So there are two kinds of different dead loop need to discuss

  1. ("programmers rarely use in that way") 1e-323 (the step is too small)
  2. ("is the normal usage of a feature but have a small probability to occur") current yielding value is too big so next + step === next (happened on number type) (for BigInt, when the engine cannot represent it, it should throw).

Yeah, it's reasonable to produce ranges with sub-1 values for the step, but they're generally gonna be values like .1 or .01, etc.

It's not unreasonable to go smaller, but going to 1e-323 is ridiculously tiny. That value doesn't show up unless you're deliberately testing floating-point precision, or have a pretty weird logic error that'll affect more things than just the range. No reasonable code ever actually produces values anywhere near that size.

1e-323 might comes from bad user input

Use num += step would cause floating point errors every time (if either num or step is not integer). So, I would prefer start + i * step instead. Here are some candidates I prefer:

length = Math.ceil((end - start) / step) - 1;
if (!Number.isSafeInteger(length)) throw TypeError();
for (let i = 0; i < length; i++) {
  yield start + i * step;
}
for (let i = 0; i <= Number.MAX_SAFE_INTEGER; i++) {
  const value = start + i * step;
  if (value >= end) return;
  yield value;
}
throw RangeError();
for (let i = 0n; ; i++) {
  const value = start + Number(i) * step;
  if (value >= end) return;
  yield value;
}

@tiansh yes the current algorithm is based on * not +

1e-323 might comes from bad user input

I am extremely curious how that might be the case. I have never once seen anything remotely like that.

Hmm, it's like a spherical cow, but it's possible that the developer use parseFloat to parse the user input then a bad guy types 1e-323

Hmm, it's like a spherical cow, but it's possible that the developer use parseFloat to parse the user input then a bad guy types 1e-323

So what we can do to against that? I believe nothing. It is impossible to tell what input is valid and what is not. You may stop 1e-323. Then what about 1e-322? There won't be an edge to describe which is valid or not. And, even an for (i of range(1, 10, 0.00000001)); would hang your browser as expected. That doesn't cause any infinity loop at all. And, I believe a Turing complete language should not prevent user from writing infinity loop at all.

It is impossible to tell what input is valid and what is not.

No, we can. If the last yielding value is equal to the current yielding value, we know we are going to a bad end.

It is impossible to tell what input is valid and what is not.

No, we can. If the last yielding value is equal to the current yielding value, we know we are going to a bad end.

No. Your assumption is incorrect.

You had mentioned we use * not +.

Let's consider range(42, 42+1e-14, 1e-15)

42+1e-15===42 but a for(i=0;;i++){n=start+i*step;if(n>=end)break;print(n)} loop will end after 5 iteration.

And by using BigInt for loop variable i. It can handle any cases like range(42, 43, 1e-300). You only need about 1e300 iteration which is still not infinity.

Oh you're right, so maybe we should give up this kind of protection...

  1. ("programmers rarely use in that way") 1e-323 (the step is too small)

We can't simply say 1e-323 is too small, actually the numbers are related. For example 0x100_0000_0000 with step 0.0001 also cause dead loop.

1e-323 may be not reasonable code, but is 0x100_0000_0000 with step 0.0001 also not reasonable? I feel it's not easy to say...

Consider the main use case is integer, maybe we could:

Make range(start, end, step) throw if any arg is not safe integer.

If someone want float, they could use range(start, end, {step, useFloat: true}), (or just a separate api like floatRange(start, end, step)). By forcing a special API for special cases, we can assume most programmers who use such special API already read the docucment and know what they are doing :)

If someone want float, they could use range(start, end, {step, useFloat: true}), (or just a separate api like floatRange(start, end, step)).

If so, I'd like to ban float number in Number.range and leave it to ship with Decimal proposal. (Decimal.range)

If so, I'd like to ban float number in Number.range and leave it to ship with Decimal proposal. (Decimal.range)

Why? I didn't see any benefit here. Is there any analyse of usage shows using integer is the mainly usage?

commented

we definitely need to support numbers.

If so, I'd like to ban float number in Number.range and leave it to ship with Decimal proposal. (Decimal.range)

I think if we ban float numbers range, we could also ban Number.range totally :P , because we already have BigInt.range :)

I suppose we want ranges for all numeric types (Number, BigInt, Decimal ...). My previous suggestion is try to solve the pitfall of dead loop (and precision lost) in Number.range by separating the API into safe part (which only deal with safe integers) and unsafe part (which deal with floats and the ints beyond safe ints).

by separating the API into safe part (which only deal with safe integers) and unsafe part (which deal with floats and the ints beyond safe ints)

A browser normally allow JavaScript run under ~10s. And the script will be considered as not responding after that. There's no way to tell whether a script, written in Turing complete language, has dead loop or not. So the only way to measure is execution time. Due to the same reason, I would consider for (i of Number.range(1,Number.MAX_SAFE_INTEGER)) a dead loop, since it cannot be finished in a reasonable time. To the same reason you had discussed. We should make this API like this: Number.range(1, Number.MAX_SAFE_INTEGER, 1, 1, Number.MAX_SAFE_INTEGER, 1, Number.QUITE_LARGE_RANGE_SUPPORT_YES_I_REALLY_REALLY_MEAN_IT)[like this] to make sure nothing had gone wrong. Why should we care about that at all?

Due to the same reason, I would consider for (i of Number.range(1,Number.MAX_SAFE_INTEGER)) a dead loop

You can break in the for loop based on the current yielding value

@Jack-Works Every other cases can, too.

@tiansh There are slight differences between these two types of "dead loop".

range(Number.MAX_SAFE_INTEGER, Number.MAX_SAFE_INTEGER + 10) is a real dead loop. range(1, Number.MAX_SAFE_INTEGER) actually is not a dead loop, just do not have enough resource to finish. (If Moore's law still magically stands, it could be finished in reasonable time in 2070 🤪 .)

Why should we care about that at all?

I think most programmers understand the resource limit, and it's easy for programmers to deal with "cannot be finished in a reasonable time" problem --- range(a, b).take(1000). (I love iterator helpers 😄). So it become even much hard for programmers to understand such code still possible cause a real "dead loop". It's just a good example of my previous comment:

Maybe it's hard to produce in the first place, the problem is there is no way to avoid it without manually checking every time ... if the "edge case" is just the normal usage of a feature but have small probability to occur, it's actually a very bad thing for programmers.

@hax

  1. Infinity loop detection can be done by linting tools but not interpreter nor language specification. You should never try to do this on any Turing complete language. Since there is not any reliable ways, and will not be any reliable way, to detect an infinity loop at all.
  2. As we discussed above, we can / would / should write the range function like this: for (let i = 0; ; i++) { const val = start + i * step; if (val >= end) return; yield val; }. (Actual code would be more complex. This one ignore negative steps.) And your example, range(Number.MAX_SAFE_INTEGER, Number.MAX_SAFE_INTEGER + 10), is NOT an infinity loop at all. You may change i to BigInt type if you are decided to support range with more than Number.MAX_SAFE_INTEGER elements. (Only with some floating point errors in such case.) So these cases won't cause an infinity loop at all.
const start = Number.MAX_SAFE_INTEGER;
const end = Number.MAX_SAFE_INTEGER + 10;
const step = 1;
for (i = 0; ; i++) {
  const val  = start + i * step;
  if (val >= end) break;
  console.log(val);
}
  1. Floating point errors are not introduced by this proposal. It just exits and you cannot prevent if you are using any method on Number's instead of BigInt's. Floating point errors could be confusing to developers. But that not we are trying to solve in Number.range. As it does not make this problem more confusing.

@tiansh The adoption of start + i * step is already one of the option of avoid dead loop, see #7 (comment) (option E).

But there are other options. For example, with overloading range(start, end, step) for safeint and range(start, end, {step, useFloat: true}) for others, we could just use simple i += step. Basically this is the combination of option G (for safe ints) + option D|E|F.

I'm not saying this solution is definitely better than others, what I am trying to do is finding all options as far as i can so we can investigate each option.

So the conclusion of this, I think, is that we don't want or need any special logic for dead-loop detection.

Using val = start + i*step (rather than val += step) already avoids all the reasonable cases that might accidentally infinite-loop, where the step underflows the precision of the current value. Instead it'll just produce the same value for a while, then increment eventually, and in time will hit the limit and end.

If you're producing ranges from arbitrary user-provided input, there's plenty of perfectly finite ranges that'll take centuries to iterate thru, so looking out for accidentally-infinite ranges isn't even going to help.

So I think we can close this as no change?

Agree with @tabatkins
I'll close this as no change later since consensus seems to been reached to not have this kind of protection.
If there are any objections please leave a comment.

A small issue is val += step may be faster than val = start + i*step.

I also like to summary the options I listed before:

D. for (x = start; x < end; ) { ...; next = x + step; if (next === x) x=nextNumber(x, start < end); else x = next }

9007199254740991
9007199254740992
9007199254740994

E. for (i = 0, x = start; x < end; ++i, x = start + step * i)

9007199254740991
9007199254740992
9007199254740992
9007199254740994

F. for (i = 0, n = (end - start) / step, x = start; i < n; ++i, x = start + step * i)

9007199254740991
9007199254740992
9007199254740992
9007199254740994
9007199254740996

These three options all works (for avoid dead loop), each have different extra invariance (with possible break the invariance of other options).

  • Option D add a invariance that range never produce two same value.
  • Option E add a invariance that the values are produced linearly as far as possible. (seems most of us prefer this option, and the draft have adopted it)
  • Option F add a invariance that range will produce (end - start) / step - 1 values.

(I ignored Option B and C because they throw until something happen, which I think just translate the programmers' surprise from "why dead loop" to "why suddenly throw" and not really solve the issue)

For the issue (dead loop) I'm ok with option E, and we could discuss whether split Number.range to two versions of safeint/float in a new issue.

@hax Avoid infinity loop is not the most important thing when we find loop algorithm. This is due to the fact that infinity loop is not such common thing. And it still yield some strange result no matter which one we chose. On the other hand, making floating point errors as small as possible would be more important, imo.

The algorithm using += step would accumulate floating point errors and it is not a good idea. Let's consider a more common use case: range(0, 20, 0.2). Your first += step code yield 201 elements, while last 2 are are 19.79999999999996, 19.99999999999996. But the second and third one output 200 elements, while last 2 are 19.6, 19.8.

I'm not sure if E and F will yield different results. Could anyone provide a input which make them yield different?

The algorithm using += step would accumulate floating point errors and it is not a good idea.

I agree. But if all arguments are safe int, there is no accumulate errors so we could still use simple +=. There are two ways to use +=, one I already mentioned is splitting Number.range to two API (separate methods or overloaded one method), another possibility is checking the args, if all are safe ints, use += step, or use start + step * i.

I'm not sure if E and F will yield different results.

The difference is exit condition.

  • E: start + step * i < end
  • F: i < (end - start) / step
  • H: end - start < step * i (I found that I missed this one)

Because of float precision errors, they may have different results.

There is another BigInt.range method in this proposal. That method only support integers, and no floating point errors.

they may have different results

So, any test cases?

any test cases?

For example, Number.range(9007199254740991, 9007199254740996):

output of E: 9007199254740991, 9007199254740992, 9007199254740992, 9007199254740994
output of F,H: 9007199254740991, 9007199254740992, 9007199254740992, 9007199254740994, 9007199254740996

There is another BigInt.range method in this proposal. That method only support integers, and no floating point errors.

Yeah, but it's BigInt, and need type conversions for normal numbers.

But if all arguments are safe int, there is no accumulate errors so we could still use simple +=

We can leave that to implementation optimization. If there is no observable difference of += and * for safe integer, implementation can replace them to speed up this common use case

This issue is mixing two cases of precision lost. I have split into two separate issues.

#33 for Precision lost (Number.MAX_VALUE + 20 === Number.MAX_VALUE) behavior
#34 for Precision lost (1e-324 + 1e-324) === (1e-324) behavior

please move to those two issues, thanks.