tc39 / Function-prototype-toString-revision

:fishing_pole_and_fish: ECMA-262 proposal to update Function.prototype.toString

Home Page:https://tc39.github.io/Function-prototype-toString-revision

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Methods should drop their source

erights opened this issue · comments

The source string for a concise method, evaluated as an expression, will generally be a syntax error. Thus, there's no clear utility to producing this source. Concise methods should instead print as native functions do, with bodies guaranteed not to parse.

For ({ foo() {} }).foo.toString(), I currently get:

  • Safari: "function foo() {}"
  • Chrome: "foo() {}"
  • Firefox: "foo() {}"
  • Edge: "function () {}"

… so at least, there's not a definite "web reality" that would force our hand, although 2/4 is something.

However, first, there's still utility - even purely for debugging information. In addition, a concise method being able to masquerade as native code might break the web - consider lodash's isNative.

Second, when evaluated in the proper lexical context - inside an object literal, in this case - it will parse just fine, which is the standard that I've heard historically applied (not "evaluated as an expression", but "evaluated in the expected lexical context"). Similarly, a function with a non-simple parameter list, that was created in strict mode, needs to be evaluated in the right lexical context - in strict mode - to avoid an error, so I'm not sure where the requirement would come from that it need to be an expression.

I agree with @ljharb here on all points.

@erights I don't think you will be able to convince the browser maintainers to sacrifice a legitimately useful debugging facility. I'm not even sure what the motivation is behind this requirement. What is your end goal and why is it important for JavaScript programmers?

@erights Can I get some feedback here? We can't work to change this stage 3 proposal without knowing what it is you care about.

What I really care about is that the programmer is able to write a closed-enough strict function, send it through a communications path that stringifies, transmits the string, receives and evaluates it in a remote matching strict lexical scope, and calls it there; where the programmer can reliably know either that its semantics was preserved-enough or reliably get an error early in the development process, such as by any testing that covers this case at all. Let's call a function whose semantics is preserved-enough by such communications a "portable function".

By "closed enough", I mean a set of whitelisted free variables that is part of any such communications system and is expected to be stable for long periods of time. Other than free variable names, the other lexical context is: strict, evaluation as an expression-statement (in a Program as a strict script) produces the function as a completion value, lexically enclosing this and arguments are undefined (matters only for arrow functions). What other ways can lexical context differ that effects the meaning of the resulting function?

This purpose is best served by a variation on https://github.com/domenic/proposal-function-prototype-tostring-censorship coupled to this proposal: The directive we need is the positive one, not the negative one, meaning essentially "this is intended to be a portable function". If this directive appears in a function whose stringification cannot be portable, such as a concise method, then we should get an early error, or at least a guaranteed unevaluable string such as the native function syntax. If this directive appears in a function within code under the proposed CSP switch that drops function sources, then it locally overrides that switch, preserving this source. With this directive, for new code, it would not even hurt my purposes for the "drop sources" behavior to be the default. Unfortunately, this change of default would break the web; it would be incompatible with already-deployed portable functions that have no such directive.

Such portable functions would do everything motivating bloocks with much less additional complexity added to the language. If a given communications channel within the same address space, like that anticipated by bloocks, skipped the stringification and re-evaluation steps while preserving their semantics --- reusing the internal compiled function behavior while wrapping it in a fresh function object in the receiving "evaluating" realm --- that should be an unobservable optimization. Note that engines already do this optimization of new realm creation for builtin functions, where the same underlying function implementations is shared but fresh function object wrappers are created per realm.

An open question for both this portable-function directive or any bloock alternative is whether the mechanism itself should somehow be informed about the closed-enough whitelist of free variables, so that it can give an early diagnostic if the function contains free variables not on the whitelist. The usage to date of portable functions has had no such builtin mechanism. It has been adequate to evaluate the string in a limited lexical scope on reception, and to get a dynamic ReferenceError (on read) or TypeError (on write) when dereferencing a variable not provided by that limited lexical scope. But earlier diagnostics would be better.

Attn @domenic @jfparadis @warner @tribble @FUDCo

a concise method can easily be portable, if you consider the matching lexical scope to include “inside a class body” or “inside an object literal” - and I’m not sure how else it could be considered.

Can you respond to my second point in #32 (comment) ?

@ljharb I'm responding to the question of what I really care about. To clarify, for my purposes, the lexical context needs to be one that a receiver can evaluate strings in, and that a communications channel can standardize on. Hence

  • a closed-enough agreement on a whitelist of free variables
  • strict only
  • an paren-enclosing expression statement in a Program evaluated as a strict script
  • lexical this or arguments being undefined

To use this mechanism reliably, it suffices that if a function containing the I-am-a-portable-function directive is cannot actually be portable by this definition, then we reliably-enough get an error early in the development process.

I realize that what I'm explaining here is disruptive to several proposals in process: this one, source censorship, and bloocks. But such a positive directive with these guarantees would satisfy the issues I care about, that all three of those proposals touch upon. In that case, a concise method without such a directive could just stringify as currently specified by this proposal, so long as it had no such portability directive.

In that case, your positive directive seems like a clear follow-on, that wouldn’t need to disrupt this proposal at all?

Can you respond to my second point in #32 (comment) ?

Second, when evaluated in the proper lexical context - inside an object literal, in this case - it will parse just fine

I don't understand this. How do you eval within an object literal? Grammatically, you cannot do a direct eval at the position a method would appear in an object literal. Likewise with the position a method would appear within a class.

You wrap the string to be evaluated in the appropriate lexical context - which could also be sufficient to extract the method from the newly created object or prototype (if you need to install it on an existing object).

You wrap the string to be evaluated in the appropriate lexical context - which could also be sufficient to extract the method from the newly created object or prototype (if you need to install it on an existing object).

Experience shows that even experts, when trying to do such wrapping, often accidentally introduce injection vulnerabilities.

https://code.google.com/archive/p/google-caja/issues/1616
https://bugs.chromium.org/p/v8/issues/detail?id=2470
https://bugs.webkit.org/show_bug.cgi?id=106160
https://bugs.webkit.org/show_bug.cgi?id=131137

It has been hard for experts to construct code by string append, that includes unescaped user-provided strings, and not introduce such vulnerabilities. These make Zalgo look tame by comparison ;)

Sure - but i think that when we’re taking about replicating a lexical context and evalling strings, we’re already pretty far out of the realm of “easy” :-)

That's why the case I care about is a standardized receiving environment with the elements I enumerate at #32 (comment) . Given Realms, this case actually is easy.

@erights, it seems to me that particular process you're describing is fairly niche, and as such it seems entirely reasonable that it have to take on the additional requirement that the communication channel communicate which type of function it is (at least whether or not it is a method). Then that case is satisfied (or is it not?) and methods are not required to drop their source: that is, this proposal can go ahead as-is.

Leaving aside the broader question of how to design portable functions, is such a requirement for the case you're describing not acceptable? If not, why not? What would be acceptable?

A general process issue I've been realizing lately is that we've all been trying to uphold unstated invariants, by reviewing and catching where proposals break those invariants. However, while these are unstated, a spec mistake, where we broke an invariant but didn't realize it till later, would become normative. If we state these invariants as assertions in the spec, then such a mistake becomes a spec inconsistency that needs resolution, rather than automatically making the mistake normative. So...

With the addition of something along the lines of the following assertion to the proposed spec, I'm happy to see this proposal go forward to stage 4:

Assert: For all strict function forms in the syntax of EcmaScript except methods (which currently includes strict function declarations and expressions, classes, arrow functions, function* generators, arrow generators, async functions, async iterator functions, ...), either it has the square-bracket native function syntax which is guaranteed not to parse, or, when placed within parens, will evaluate as as an expression in an adequately similar lexical scope, producing as the completion value the same value (typically a function value) as that produced by the original form.

The tricky parts here are

  • function/class declaration vs expression. All function declaration source strings, when enclosed in parens, become a function/class expression producing an equivalent function.
  • class decorators that replace the class. AFAICT, the replacement should be the same for decorated declarations vs decorated expressions. But I might be missing something.

Note that I am stating this invariant under the assumption that it actually is true now. I am not trying, by this assertion, to propose a change to the status quo. Rather, I seek to ensure that we don't accidentally break this property as the spec evolves.

This is why I grandfathered in an exception for methods.

This would also require that class decorators of an exported class would occur after the "export" keyword, since that keyword would not be part of any corresponding expression syntax. By contrast, a hypothetical "abstract" qualifier could (and probably should) come after the decorators, and would remain a valid part of the corresponding expression syntax.

@erights if "adequately similar lexical scope" includes "inside a similar class body" or "inside a similar object literal", then that sounds great.

Note that asserting some regularity that is currently true is not necessarily a commitment to keep it true forever. We could change our mind. But it ensures that we don't normatively compromise that regularity by accident.

@ljharb No, because that is not a context in which I can evaluate a parenthesized expression. But I'm grandfathering methods as an exception anyway, and I don't think anything else has this issue.

@erights you can wrap the string in that context tho, and eval the new string - why is that not sufficient?

That's not the invariant I care about. If you want to add it as a separate invariant, that would be fine.

The reason I bring it up is not because I think it's valuable additionally, but because I think that if it replaces your invariant, then "methods" are no longer an exception.

I don't see how to phrase your invariant so that it covers your case, and also covers enclosing in parens the source text of what was originally a function declaration. Ideas?

Something like "When eval-ed in the appropriate lexical context with a similar scope environment (such as in statement position, in expression position, inside a similar object literal, inside a similar class body, etc) the toString must either evaluate similarly, or throw a SyntaxError"?

That does not read to me like a requirement that a declaration form can be repurposed as a parenthesized expression. It read like: if it was an expression, it must still work as an expression. If it was a declaration, it must still work as a declaration.

Also, aside from methods, the only SyntaxError case I want to allow is the square-bracketed native method syntax form stated by this proposal.