hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Initialization vs assignment in a loop

ntrel opened this issue · comments

To Reproduce

main: () =
{
    i := 0;
    p: std::unique_ptr<int>;
    while i < 3 next i++ {
        std::cout << i << "\n";
        p = new<int>(i);
        std::cout << p* << "\n";
    }
}

The p = new<int> line generates a call to p.construct, which works for the first iteration to initialize p. Then on the second iteration, an assignment was intended, but p.construct is called again, which causes a contract violation.

0
0
1
Contract violation
terminate called without an active exception
Aborted (core dumped)

git cppfront, g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

This should be rejected, just like https://cpp2.godbolt.org/z/8ssxKs4YT is:

main: () = {
  p: std::unique_ptr<int>;
  if true {
    p = new<int>(i);
  }
}
main.cpp2...
main.cpp2(2,3): error: local variable p must be initialized on both branches or neither branch
main.cpp2(3,3): error: "if" initializes p on:
  branch starting at line 3
but not on:
  implicit else branch
  ==> program violates initialization safety guarantee - see previous errors

This should be rejected

For consistency, yes. But in both cases it would be nice to only error if the variable is actually used after the branch. If it is only used in the branch where it is initialized, that could be allowed & would be useful e.g. when a pointer (declared in the same scope as the variable) is set to point at the variable after the variable is initialized. BTW the main readme mentions P1179, perhaps that will allow this and #440 as well.

Yes, I agree this is a bug, lazy initialization shouldn't be allowed in a loop (loops can be entered zero times).

it would be nice to only error if the variable is actually used after the branch.

It would be possible to make this work, but wouldn't that be equivalent to suppressing an unused variable warning?

OK, fixed.

The following errors are now diagnosed:

error1: () = {
    i: int;
    while true {
        i = 42;     // ERROR: can't initialize i in a loop
    }
    i = 42;
}

error2: () = {
    i: int;
    if true {
        while true {
            i = 42;     // ERROR: can't initialize i in a loop
        }
        i = 42;
    }
    else {
        i = 42;
    }
    i = 42;
}

And the following is allowed:

ok: () = {
    i: int;
    if true {
        i = 42;
        while true {    // OK: in-branch loop is after initialization
            i = 42;
        }
    }
    else {
        i = 42;
    }
    i = 42;
}

Also, uses like this one already in reflect.h2 continue to work:

    protected parse_statement: ( /*...*/ )
        -> (ret: std::unique_ptr<statement_node>)  // LAZILY INITIALIZED
    = {
        /*... code that doesn't mention ret ... */

        if  /*...*/ {
            while /*... code that doesn't mention ret ... */ {  // OK: this loop doesn't matter
                /*... code that doesn't mention ret ... */
            }
        }

        /*... code that doesn't mention ret ... */

        ret = parser.parse_one_declaration(  // OK: definite first use of ret is a construction
                tokens*.get_map().begin()*.second,
                generated_tokens*
              );
        /*...*/
    }

So this is invalid C++2?

error1: () = {
    i: int;
    while true {
        s = input_from_user();  // string
        if s.is_int() {
            i = s.to_int();     // ERROR: can't initialize i in a loop??
            break;
        }
        // ask again
    }
    // Is this point flagged as an error too?
}

Oh, is the reasoning here that the user can initialize i where it's declared to some default value, and that's easier than trying to distinguish between construction and assignment in a loop?

So this is invalid C++2?
[... example where line 6 is:]
i = s.to_int(); // ERROR: can't initialize i in a loop??

Correct, and is now diagnosed with the above recent commit:

test.cpp2(6,13): error: local variable i cannot be initialized inside a loop
  ==> program violates initialization safety guarantee - see previous errors

// Is this point flagged as an error too?

No because you get the earlier error. But if you had code that didn't have the error but then didn't use a the variable you would get an error from the Cpp1 compiler if "unused variable" warnings are on.

that's easier than trying to distinguish between construction and assignment in a loop?

The latter isn't just harder, I think it's not possible for for or while (but possibly could work for do).

The primary reason is that a for or while loop could execute zero times, so it might never have an opportunity to initialize at all.

If somehow we guaranteed at-least-once loop semantics (such as do does, or by inventing a for_at_least_once and while_at_least_once or similar), we could allow initialization inside a loop at the cost of generating an additional first_iteration local variable and then generating { auto&& __rhs = s.to_it(); if (first_iteration) { i.construct(CPP2_FORWARD(__rhs)); } else { i = CPP2_FORWARD(__rhs); }. That's possible but I haven't found a reason to implement that flexibility. Plus it costs 0-2 extra variables (modulo optimizations) and a branch, though that's less of a concern because you would pay for the overhead only if you use it.

It would be possible to make this work, but wouldn't that be equivalent to suppressing an unused variable warning?

Not if it is initialized and used in the branch. So why not declare the variable in the branch? The example shows why - though in real code the type of the variable might be more complex - some complicated template instantiation.

main: () = {
  i: int;
  if e1 {
      // other code
      if e2 {
          // initialize and use i
      }
      // no use of i
  }
  else if e2 {
    // initialize and use i
  }
  // no use of i
}

Not if it is initialized and used in the branch. So why not declare the variable in the branch?

I agree with both parts, including "why not declare the variable in the branch?" That is, in a case like this why wouldn't it be better to write it as follows?

main: () = {
    if e1 {
        // other code
        if e2 {
            // initialize and use i
            i := 42; std::cout << i;
        }
        // no use of i
    }
    else if e2 {
        // initialize and use i
        i := 42; std::cout << i;
    }
    // no use of i
}

Now it's correct by construction; you can't make the mistake of using i later because its name isn't even in scope.

The purpose of declaring a local in a larger scope is so that it can be used later in the scope (in both Cpp2 and Cpp1), and allowing it to have no initializer allows initializing it in a nested branch scope first including to use different constructors etc. (in Cpp2). If it's only going to be used in the branch scope, shouldn't it be declared there?

in real code the type of the variable might be more complex - some complicated template instantiation

Well you can alias that type and still declare the variables in the innermost scope.

in real code the type of the variable might be more complex - some complicated template instantiation

Well you can alias that type and still declare the variables in the innermost scope.

I agree with @jcanizales but I now see I had missed @ntrel 's point originally... thanks Jorge for fixing my reply with a better shorter one, and my apologies Nick that I missed your clearly stated point the first time!