purescript / purescript-prelude

The PureScript Prelude

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Show String outputs illegal escape sequence \& in string literals

stengerh opened this issue · comments

Description

The REPL outputs string literals containing the escape sequence \& but this escape sequence is not accepted as input.

To Reproduce

In the REPL:

> "\x0000001"
"\0\&1"

> "\0\&1"
Illegal character escape code at line 1, column 3

Expected behavior

The REPL outputs valid string literals.

Additional context

Haskell supports \& to delimit a numeric escape sequence from the following characters, see The zero-width escape sequence. This is exactly the way in which the PureScript REPL uses \& to produce canonicalized string literals. However the compiler does not actually accept this escape sequence. If it did this would provide a solution to the problem in purescript/purescript#3750.

PureScript version

0.13.8

I discovered this while experimenting with the PureScript plugin for IntelliJ and comparing its syntax highlighting with the behavior of the PureScript compiler/REPL. Unfortunately I could not find any documentation on the escape sequences which PureScript supports. At least not in the sections on Syntax and Differences from Haskell. The only source of truth here is the source code of the PureScript compiler itself.

This lack of documentation seems to have caused some confusion in PureScript itself and even more so in the IntelliJ plugin. I can prepare a pull request for the documentation, so other can benefit from what I learned.

I can confirm this still affects 0.14. Interestingly, it doesn't affect type-level symbols (at least in 0.14):

> "\x0000001"
"\0\&1"

> "\x000001"
"\1"

> data SProxy (s :: Symbol) = SProxy
> :t SProxy :: SProxy "\x0000001"
SProxy "\x0000001"

> :t SProxy :: SProxy "\x000001"
SProxy "\x000001"

> :t SProxy :: SProxy "\x00001"
SProxy "\x000001"

I'm not familiar enough with the pretty printer to diagnose/fix this.

Wait. I just realized that this is an issue with the Show String instance in the prelude, not the compiler. See

exports.showStringImpl = function (s) {
var l = s.length;
return "\"" + s.replace(
/[\0-\x1F\x7F"\\]/g, // eslint-disable-line no-control-regex
function (c, i) {
switch (c) {
case "\"":
case "\\":
return "\\" + c;
case "\x07": return "\\a";
case "\b": return "\\b";
case "\f": return "\\f";
case "\n": return "\\n";
case "\r": return "\\r";
case "\t": return "\\t";
case "\v": return "\\v";
}
var k = i + 1;
var empty = k < l && s[k] >= "0" && s[k] <= "9" ? "\\&" : "";
return "\\" + c.charCodeAt(0).toString(10) + empty;
}
) + "\"";
};

We should probably update it to match the compiler's output though though.

Okay, I transferred it to the prelude from the compiler repo.

Thanks for the quick response!

I looked into this more closely in the meantime and realized the scope of the bug report was probably too narrow. It was not just the \& but also the decimal escape sequence \1. Char literals are also affected.

I cannot tell which other code would be affected by this change to the prelude. I would however prefer that the REPL pretty-printed string and char literals using PureScript escape sequences.

Correct me if I'm wrong but is prettyPrintStringJS where the compiler prints a string?

Here's another place where a breaking change can be done now. How do we fix this?