Separator: `=` vs. `:`

Question

Separator: `=` vs. `:`

mathiasbynens opened this issue 8 years ago · comments

Perl does both:

$ perl -Mutf8 -E 'say "π" =~ /\p{Script=Greek}/'
1

$ perl -Mutf8 -E 'say "π" =~ /\p{Script:Greek}/'
1

We only want to support one, but which one? The current proposal uses =, but why not :?

00:35:54 <bterlson> what is the rationale for `=` in `Script=` btw?
00:37:40 <bterlson> I mean, why not `Script:foo`
00:38:09 <mathiasbynens> no strong preference, but I think we should only support either `:` or `=` but not both: https://github.com/mathiasbynens/es-regexp-unicode-property-escapes#why-not-support--as-a-separator-in-addition-to-
00:38:29 <bterlson> both is absurd
00:38:40 <mathiasbynens> Perl does both!
00:38:46 <bterlson> absurd
00:38:50 <mathiasbynens> :)
00:39:06 <bterlson> `:` aligns with property syntax
00:43:06 <mathiasbynens> hmm yeah that makes sense… although property name grammar in \p{} is much more restrictive than Identifier

Mathias Bynens · Answer 1 · Fri Aug 12 2016 16:03:04 GMT+0800 (China Standard Time)

: aligns with property syntax, but that’s where the similarity ends — property name/value grammar in \p{…} is much more restrictive than Identifier.

= on the other hand reminds of SQL, where \p{property=value} becomes something like SELECT * FROM symbols WHERE property = 'value';, i.e. match all symbols where the value for property $property is $value. I like the mental model of querying the Unicode Database.

Mathias Bynens · Answer 2 · Fri Aug 12 2016 16:07:30 GMT+0800 (China Standard Time)

@bterlson @littledan @hashseed @patch Thoughts?

Yang Guo · Answer 3 · Fri Aug 12 2016 18:28:04 GMT+0800 (China Standard Time)

I'd say it's arbitrary. Any separator would do.

Brian Terlson · Answer 4 · Sat Aug 13 2016 00:31:34 GMT+0800 (China Standard Time)

I still prefer : slightly as I like to think about it like creating an options bag, but my only strong preference is to not do both.

Daniel Ehrenberg · Answer 5 · Sat Aug 13 2016 03:27:26 GMT+0800 (China Standard Time)

I prefer = slightly, but that may just be because that's the first syntax I saw @hashseed implement and it looked nice to me.

Brian Terlson · Answer 6 · Sat Aug 13 2016 03:34:35 GMT+0800 (China Standard Time)

Time for a twitter poll! :-P

Brian Terlson · Answer 7 · Sat Aug 13 2016 03:44:43 GMT+0800 (China Standard Time)

https://twitter.com/bterlson/status/764184006095048704

ECMAScript’s RegExps are learning more about Unicode with the \p proposal. What syntax should it use?

330 votes:

52% /\p{Script:Greek}/

28% /\p{Script=Greek}/

20% Why not both?

Brian Terlson · Answer 8 · Sat Aug 13 2016 03:51:23 GMT+0800 (China Standard Time)

This seems possibly confusing as @bmeck points out.

let foo;
`${foo=1}`; // foo = 1
/\p{foo=1}/; // syntax error?

Yang Guo · Answer 9 · Sat Aug 13 2016 04:12:29 GMT+0800 (China Standard Time)

Not really sure why that’s confusing… one is a string template and the other is a regexp literal. Syntax is entirely different…

Brian Terlson · Answer 10 · Sat Aug 13 2016 04:19:32 GMT+0800 (China Standard Time)

It's possibly confusing because in order to understand what foo=1 is doing you have to understand that the syntax is entirely different despite looking identical (and even the surrounding syntax is similar what with the curlies and all).

Nova Patch · Answer 11 · Sat Aug 13 2016 05:27:16 GMT+0800 (China Standard Time)

In theory I like : better, but in practice I use and teach = because that's what I see much more frequently in the wild and more regex engines support it. I think of regex as a language of its own embedded within other languages without any syntactic relationship to the languages that embed it. Note that the = in \p{…=…} aligns with the = in (?=…) for positive lookaheads and (?<=…) for positive lookbehinds.

I performed an extremely unscientific survey of my locally checked-out git projects (which of course includes my own code):

$ ack -ch '\\p\{\w+=\w+\}'
814
$ ack -ch '\\p\{\w+:\w+\}'
121

Also the regex docs for Java and ICU only include = as well as the specification for Unicode Sets and their use in the Unicode CLDR data files. Lastly, I've never seen a regex engine that solely supports : but would love to hear about it if anyone knows one.

Mathias Bynens · Answer 12 · Tue Aug 16 2016 00:17:53 GMT+0800 (China Standard Time)

In theory I like : better […]

Seems like most people feel that way.

@patch makes a very good point in favor of =, though:

I think of regex as a language of its own embedded within other languages without any syntactic relationship to the languages that embed it. Note that the = in \p{…=…} aligns with the = in (?=…) for positive lookaheads and (?<=…) for positive lookbehinds.

I’m slightly leaning towards sticking to = now.

@bterlson What do you think?

Brian Terlson · Answer 13 · Tue Aug 16 2016 01:38:52 GMT+0800 (China Standard Time)

I find @patch's arguments the most persuasive so far and am convinced that regexp experts will generally prefer =. I'm not sure JS developers would generally find it more approachable because they may not be regexp experts, have experience with other engines, know much about Unicode, see the correspondence between lookaheads, etc.

I cannot argue strongly in favor of : so I support moving forward with =. The twitter poll is clearly in favor of :, though, fwiw :)

Mathias Bynens · Answer 14 · Wed Sep 28 2016 09:14:38 GMT+0800 (China Standard Time)

FAQ entry: Why use = (and not something else) as a separator?

Daniel Ehrenberg · Answer 15 · Wed Sep 28 2016 13:11:53 GMT+0800 (China Standard Time)

At TC39, we decided to reverse the judgement here and go with :.

Brian Terlson · Answer 16 · Wed Sep 28 2016 13:20:28 GMT+0800 (China Standard Time)

I feel like we didn't represent the FAQ entry contents well... do you @littledan? If not maybe we can do a quick re-check?

Mathias Bynens · Answer 17 · Wed Sep 28 2016 13:28:04 GMT+0800 (China Standard Time)

A quick re-check sounds good! If the decision made in this issue is reversed, I’d love to hear the rationale for it.

Daniel Ehrenberg · Answer 18 · Wed Sep 28 2016 15:31:06 GMT+0800 (China Standard Time)

OK, I'll see if we have time to discuss this at this TC39 meeting later. The rationale was that = is used for property set, but the examples in the FAQ seem to show that RegExps already assign a new meaning to =.

Daniel Ehrenberg · Answer 19 · Wed Sep 28 2016 15:31:39 GMT+0800 (China Standard Time)

Cc @allenwb who made the point for : rather than =.