[BUG] Variable declarations can refer to themselves

Question

[BUG] Variable declarations can refer to themselves

DyXel opened this issue 3 months ago · comments

Dylam De La Torre commented 3 months ago

Describe the bug
A problem as old as C itself: You are able declare a value and "initialize it" with itself, which is unsafe (and confusing).

To Reproduce

main: () -> int = { x: int = x; return x; }

Godbolt reproducer: https://cpp2.godbolt.org/z/hxros3zM9

Additional context
We are fortunate that most compilers diagnose this, for example, Clang says:

warning: variable 'x' is uninitialized when used within its own initialization [-Wuninitialized]

However, IMO a warning is not enough, this should be an error because:

If you wanted to do something with properties of x before its initialized, in C++ you have the tools to do so (like type deduction).
If you wanted to copy or otherwise refer a variable name x that is elsewhere, this would shadow it, which is confusing and brittle since it would silently break if you rename the shadowed variable.

gregmarr · Answer 1 · Wed Mar 20 2024 22:22:21 GMT+0800 (China Standard Time)

A presentation I saw today actually used a variant of this that was technically legal. I'm not saying that this is good code, but it is something that we might need to handle. I'm writing this from memory, but I think this is accurate.

T func(T *x) { <initialize *x>; return *x; }
T x = func(&x);

Maybe it could be written like this:

func : (out x : int) -> int = {
    x = 0;
    return x;
}

main :() = {
    x : int = func(x&);
}

Herb Sutter · Answer 2 · Thu Mar 21 2024 05:45:50 GMT+0800 (China Standard Time)

Thanks! Done. But I hesitated, let me tell you why... 🗨️

In general, because I have limited time and I want to focus on adding new value beyond Cpp1, I try to avoid doing extra work to add checks for things that the Cpp1 compilers already generally diagnose consistently and well (as a warning or error), and this is one that it looks like all the major compilers do flag with a nice readable warning.

So that's why I almost didn't do this... but because this is about initialization safety and I thought 'well, maybe it's not too hard,' I took a look and it worked out pretty well. Note that I had to add a little logic to not flag code like f: int = t().f(); in existing test cases.

But this gave me a chance to explain why I'll say No to some requests like this, and a case to point back to in the future where I did it anyway as proof that I don't always just say No. 😁

Herb Sutter · Answer 3 · Thu Mar 21 2024 05:47:56 GMT+0800 (China Standard Time)

Oh, and the original code at top now gets this:

demo.cpp2(1,30): error: local variable x cannot be used in its own initializer

Dylam De La Torre · Answer 4 · Thu Mar 21 2024 05:52:20 GMT+0800 (China Standard Time)

Hey, thanks for handling this, and I totally get you (I promise I won't get mad if you say No in the future 😛). The real reason I opened the bug is because I was reminded of Bjarne's talk on safety where he said: "diagnostics are no longer enough, we need enforcement of some kind", and unfortunately, there's a lot of places where warnings are not enforced, so it seems to me like a good place for cppfront to step in and prevent you from doing that mistake ever again.

Herb Sutter · Answer 5 · Thu Mar 21 2024 07:45:27 GMT+0800 (China Standard Time)

No worries. Note that this is not necessarily without minor downsides... one use case this prevents (and we can carve out to narrowly allow if there's demand) is taking the address of the variable to register it with something that's also initializing it. I doubt that there are important existing APIs that do that, but we'll see if people encounter any...

gregmarr · Answer 6 · Thu Mar 21 2024 22:55:44 GMT+0800 (China Standard Time)

taking the address of the variable to register it with something that's also initializing it.

I think you can cover that with "you can pass it as a function's out parameter".
Hm, maybe not. The "address of" may not work there, as you'd need to unpack the cpp2::out struct and get the address from there.

Herb Sutter · Answer 7 · Fri Mar 22 2024 02:49:41 GMT+0800 (China Standard Time)

I thought about that, because of your example of this style... changing int to X, and adding forward to avoid a copy/move:

func : (out xx : X) -> forward X = {
    xx = 0;           // construct x
    return xx;
}

main :() = {
    x : X = func(x&); // construct x... again, and from itself
}

The principle is that a function with an out parameter is a (delegating) constructor. So at first that seemed to make sense. We're constructing, right?

But then I realized that this would be double construction -- x is being constructed in func, and then again in main (in fact, it's constructed from itself). So I couldn't think of a case where it made sense, and just flagged all uses of x in its own initializer. But we can always relax that if we find a use case.

Using a function with an out parameter as a constructor already works by taking advantage of Cpp2's lazy initialization feature, which naturally is outside the declaration:

func : (out xx : X) = {
    xx = 0;           // construct x
}

main :() = {
    x : X;
    func(x&);         // constructs x
}

gregmarr · Answer 8 · Fri Mar 22 2024 03:49:19 GMT+0800 (China Standard Time)

I've contacted you offline about the source of this sample.