[BUG] Initialization vs assignment in a loop

Question

[BUG] Initialization vs assignment in a loop

ntrel opened this issue 6 months ago · comments

To Reproduce

main: () =
{
    i := 0;
    p: std::unique_ptr<int>;
    while i < 3 next i++ {
        std::cout << i << "\n";
        p = new<int>(i);
        std::cout << p* << "\n";
    }
}

The p = new<int> line generates a call to p.construct, which works for the first iteration to initialize p. Then on the second iteration, an assignment was intended, but p.construct is called again, which causes a contract violation.

0
0
1
Contract violation
terminate called without an active exception
Aborted (core dumped)

git cppfront, g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Johel Ernesto Guerrero Peña · Answer 1 · Thu Apr 04 2024 05:02:14 GMT+0800 (China Standard Time)

This should be rejected, just like https://cpp2.godbolt.org/z/8ssxKs4YT is:

main: () = {
  p: std::unique_ptr<int>;
  if true {
    p = new<int>(i);
  }
}

main.cpp2...
main.cpp2(2,3): error: local variable p must be initialized on both branches or neither branch
main.cpp2(3,3): error: "if" initializes p on:
  branch starting at line 3
but not on:
  implicit else branch
  ==> program violates initialization safety guarantee - see previous errors

Nick Treleaven · Answer 2 · Thu Apr 04 2024 23:50:07 GMT+0800 (China Standard Time)

This should be rejected

For consistency, yes. But in both cases it would be nice to only error if the variable is actually used after the branch. If it is only used in the branch where it is initialized, that could be allowed & would be useful e.g. when a pointer (declared in the same scope as the variable) is set to point at the variable after the variable is initialized. BTW the main readme mentions P1179, perhaps that will allow this and #440 as well.

Herb Sutter · Answer 3 · Sun Jun 23 2024 03:04:02 GMT+0800 (China Standard Time)

Yes, I agree this is a bug, lazy initialization shouldn't be allowed in a loop (loops can be entered zero times).

Herb Sutter · Answer 4 · Sun Jun 23 2024 03:47:27 GMT+0800 (China Standard Time)

it would be nice to only error if the variable is actually used after the branch.

It would be possible to make this work, but wouldn't that be equivalent to suppressing an unused variable warning?

Herb Sutter · Answer 5 · Sun Jun 23 2024 05:10:12 GMT+0800 (China Standard Time)

OK, fixed.

The following errors are now diagnosed:

error1: () = {
    i: int;
    while true {
        i = 42;     // ERROR: can't initialize i in a loop
    }
    i = 42;
}

error2: () = {
    i: int;
    if true {
        while true {
            i = 42;     // ERROR: can't initialize i in a loop
        }
        i = 42;
    }
    else {
        i = 42;
    }
    i = 42;
}

And the following is allowed:

ok: () = {
    i: int;
    if true {
        i = 42;
        while true {    // OK: in-branch loop is after initialization
            i = 42;
        }
    }
    else {
        i = 42;
    }
    i = 42;
}

Also, uses like this one already in reflect.h2 continue to work:

    protected parse_statement: ( /*...*/ )
        -> (ret: std::unique_ptr<statement_node>)  // LAZILY INITIALIZED
    = {
        /*... code that doesn't mention ret ... */

        if  /*...*/ {
            while /*... code that doesn't mention ret ... */ {  // OK: this loop doesn't matter
                /*... code that doesn't mention ret ... */
            }
        }

        /*... code that doesn't mention ret ... */

        ret = parser.parse_one_declaration(  // OK: definite first use of ret is a construction
                tokens*.get_map().begin()*.second,
                generated_tokens*
              );
        /*...*/
    }

Jorge Canizales · Answer 6 · Thu Jun 27 2024 02:45:07 GMT+0800 (China Standard Time)

So this is invalid C++2?

error1: () = {
    i: int;
    while true {
        s = input_from_user();  // string
        if s.is_int() {
            i = s.to_int();     // ERROR: can't initialize i in a loop??
            break;
        }
        // ask again
    }
    // Is this point flagged as an error too?
}

Jorge Canizales · Answer 7 · Thu Jun 27 2024 02:48:58 GMT+0800 (China Standard Time)

Oh, is the reasoning here that the user can initialize i where it's declared to some default value, and that's easier than trying to distinguish between construction and assignment in a loop?

Herb Sutter · Answer 8 · Thu Jun 27 2024 03:14:24 GMT+0800 (China Standard Time)

So this is invalid C++2?
[... example where line 6 is:]
i = s.to_int(); // ERROR: can't initialize i in a loop??

Correct, and is now diagnosed with the above recent commit:

test.cpp2(6,13): error: local variable i cannot be initialized inside a loop
  ==> program violates initialization safety guarantee - see previous errors

// Is this point flagged as an error too?

No because you get the earlier error. But if you had code that didn't have the error but then didn't use a the variable you would get an error from the Cpp1 compiler if "unused variable" warnings are on.

that's easier than trying to distinguish between construction and assignment in a loop?

The latter isn't just harder, I think it's not possible for for or while (but possibly could work for do).

The primary reason is that a for or while loop could execute zero times, so it might never have an opportunity to initialize at all.

If somehow we guaranteed at-least-once loop semantics (such as do does, or by inventing a for_at_least_once and while_at_least_once or similar), we could allow initialization inside a loop at the cost of generating an additional first_iteration local variable and then generating { auto&& __rhs = s.to_it(); if (first_iteration) { i.construct(CPP2_FORWARD(__rhs)); } else { i = CPP2_FORWARD(__rhs); }. That's possible but I haven't found a reason to implement that flexibility. Plus it costs 0-2 extra variables (modulo optimizations) and a branch, though that's less of a concern because you would pay for the overhead only if you use it.

Nick Treleaven · Answer 9 · Thu Jun 27 2024 21:35:01 GMT+0800 (China Standard Time)

It would be possible to make this work, but wouldn't that be equivalent to suppressing an unused variable warning?

Not if it is initialized and used in the branch. So why not declare the variable in the branch? The example shows why - though in real code the type of the variable might be more complex - some complicated template instantiation.

main: () = {
  i: int;
  if e1 {
      // other code
      if e2 {
          // initialize and use i
      }
      // no use of i
  }
  else if e2 {
    // initialize and use i
  }
  // no use of i
}

Herb Sutter · Answer 10 · Thu Jun 27 2024 23:36:51 GMT+0800 (China Standard Time)

Not if it is initialized and used in the branch. So why not declare the variable in the branch?

I agree with both parts, including "why not declare the variable in the branch?" That is, in a case like this why wouldn't it be better to write it as follows?

main: () = {
    if e1 {
        // other code
        if e2 {
            // initialize and use i
            i := 42; std::cout << i;
        }
        // no use of i
    }
    else if e2 {
        // initialize and use i
        i := 42; std::cout << i;
    }
    // no use of i
}

Now it's correct by construction; you can't make the mistake of using i later because its name isn't even in scope.

The purpose of declaring a local in a larger scope is so that it can be used later in the scope (in both Cpp2 and Cpp1), and allowing it to have no initializer allows initializing it in a nested branch scope first including to use different constructors etc. (in Cpp2). If it's only going to be used in the branch scope, shouldn't it be declared there?

Jorge Canizales · Answer 11 · Sat Jun 29 2024 02:22:11 GMT+0800 (China Standard Time)

in real code the type of the variable might be more complex - some complicated template instantiation

Well you can alias that type and still declare the variables in the innermost scope.

Herb Sutter · Answer 12 · Sat Jun 29 2024 02:34:15 GMT+0800 (China Standard Time)

in real code the type of the variable might be more complex - some complicated template instantiation

Well you can alias that type and still declare the variables in the innermost scope.

I agree with @jcanizales but I now see I had missed @ntrel 's point originally... thanks Jorge for fixing my reply with a better shorter one, and my apologies Nick that I missed your clearly stated point the first time!