hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] *= operator ambiguity with postfix *

aswaine opened this issue · comments

I'm not sure if this is a bug, an observation or a suggestion...

Describe the bug
a* =b assigns b to the thing pointed to by a.
a*=b means a = a * b.

This doesn't occur with most other sets; a*>b and a* >b both compile and mean the same thing.

Technically &= has the same syntactic problem.

To Reproduce
Steps to reproduce the behavior:

  1. Sample code - distilled down to minimal essentials please
    a: int = 1;
    b:= a&;
    c:= a*=2;
    d:= b* =2;
    e:= a*>3;
    f:= a* >3;
  1. Actual result/error
    (godbolt)
    int a {1}; 
    auto b {&a}; 
    auto c {a *= 2}; 
    auto d {*cpp2::impl::assert_not_null(b) = 2}; 
    auto e {cpp2::impl::cmp_greater(*cpp2::impl::assert_not_null(a),3)}; 
    auto f {cpp2::impl::cmp_greater(*cpp2::impl::assert_not_null(a),3)}; 

Additional context
This is actually what I was expecting it to do, but it's broken a property of the c/cpp1 operator set, that you don't have to put spaces between operators to avoid ambiguity.

Option 1: live with it

With a context free grammar we can't disambiguate in the compiler. It's probably not going to lead to bugs because it should be caught by the compiler -- multiplication on a pointer makes no sense. And coding style can encourage spaces around =. But it feels destined to be a 'known gotcha', and be a thing that needs teaching about the language: put a space between * and = unless you really mean *=.

Option 2: change the operators

If more radical options are being considered:

I note on https://github.com/hsutter/cppfront/wiki/Design-note%3A-Postfix-unary-operators-vs-binary-operators that ^ was considered as a desirable alternative. I'm not an expert, but I think this might be possible if we're willing to rename the bitwise operators:

~ is pronounced "bitwise" (currently only used for bitwise NOT)
Bitwise AND is renamed from & to ~&
Bitwise OR is renamed from | to ~|
Bitwise XOR is renamed from ^ to ~^
Bitwise NOT is renamed from ~ to ~!
Dereference is renamed from * to ^
Reference remains &
&=, |= and ^= become ~&=, ~|= and ~^=

* always means multiply (learning win).
| becomes available for future syntax.
** also becomes available for exponentiation, which removes one thing to teach about the language -- by this point, it's probably a learning point that ** doesn't do exponentiation.

There's the obvious big downside of breaking consistency with other C-family languages, but arguably less than has already been made through changing unary operators to be postfix. And it is fairly rare that most people need to use bitwise operators, while it's easy to accidentally type a single & or | when a logical operator was intended -- this has the advantage of making bitwise operators stand out visually. The bitwise NOT operator is also maybe slightly more intuitive, although it starts to look weird to have ~! be postfix and ! prefix (which it has to be because of !=).

| is currently overloaded as a pipe operator in the ranges library, which would look weird spelled ~|, but I assume UFCS should make that go away.

Max munch makes *= always the assignment operator, and anything else should be deference followed by assignment.
See https://github.com/search?q=repo%3Ahsutter%2Fcppfront%20commenter%3Ahsutter%20max%20munch&type=issues.

Thanks! I don't think this particularly a bug, as it 's max munch. However, I agree that languages should try to avoid max munch being a surprise. It's never actually ambiguous (and in general mistakes won't compile because of the type system), but I agree it can be possibly visually ambiguous.

So perhaps my confirmation bias is showing, but the most important phrase in the issue is this one:

This is actually what I was expecting it to do,

Great / whew! 😌

but it's broken a property of the c/cpp1 operator set, that you don't have to put spaces between operators to avoid ambiguity.

It's true that postfix * creates a new max munch visual ambiguity in cases of *= / * =. But it's not new... C and Cpp1 do already allow such examples, just not with *. For example:

  • a+++b means a ++ + b, to post-increment a and then adds b

  • a+ ++b means a + ++ b, to add a to the result of pre-incrementing b

  • a+ + +b means a + + + b, to add a to the result of applying unary + twice to b

    • but full disclosure: I had to check to be sure the last one was legal 👼... see Godbolt: https://godbolt.org/z/9PzsYTo8a ... so arguably it can be surprising, but it doesn't seem to come up as an actual confusion in practice I think?

This is actually a case that Cpp2 makes a little better, since there is only postfix ++, no prefix ++.

So Cpp2 does add the *= and &= cases, but it also removes some ++ and -- cases.

Does that make sense?

Yes, that makes complete sense!