hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] Variable declarations can refer to themselves

DyXel opened this issue · comments

Describe the bug
A problem as old as C itself: You are able declare a value and "initialize it" with itself, which is unsafe (and confusing).

To Reproduce

main: () -> int = { x: int = x; return x; }

Godbolt reproducer: https://cpp2.godbolt.org/z/hxros3zM9

Additional context
We are fortunate that most compilers diagnose this, for example, Clang says:

warning: variable 'x' is uninitialized when used within its own initialization [-Wuninitialized]

However, IMO a warning is not enough, this should be an error because:

  • If you wanted to do something with properties of x before its initialized, in C++ you have the tools to do so (like type deduction).
  • If you wanted to copy or otherwise refer a variable name x that is elsewhere, this would shadow it, which is confusing and brittle since it would silently break if you rename the shadowed variable.

A presentation I saw today actually used a variant of this that was technically legal. I'm not saying that this is good code, but it is something that we might need to handle. I'm writing this from memory, but I think this is accurate.

T func(T *x) { <initialize *x>; return *x; }
T x = func(&x);

Maybe it could be written like this:

func : (out x : int) -> int = {
    x = 0;
    return x;
}

main :() = {
    x : int = func(x&);
}

Thanks! Done. But I hesitated, let me tell you why... 🗨️

In general, because I have limited time and I want to focus on adding new value beyond Cpp1, I try to avoid doing extra work to add checks for things that the Cpp1 compilers already generally diagnose consistently and well (as a warning or error), and this is one that it looks like all the major compilers do flag with a nice readable warning.

So that's why I almost didn't do this... but because this is about initialization safety and I thought 'well, maybe it's not too hard,' I took a look and it worked out pretty well. Note that I had to add a little logic to not flag code like f: int = t().f(); in existing test cases.

But this gave me a chance to explain why I'll say No to some requests like this, and a case to point back to in the future where I did it anyway as proof that I don't always just say No. 😁

Oh, and the original code at top now gets this:

demo.cpp2(1,30): error: local variable x cannot be used in its own initializer

Hey, thanks for handling this, and I totally get you (I promise I won't get mad if you say No in the future 😛). The real reason I opened the bug is because I was reminded of Bjarne's talk on safety where he said: "diagnostics are no longer enough, we need enforcement of some kind", and unfortunately, there's a lot of places where warnings are not enforced, so it seems to me like a good place for cppfront to step in and prevent you from doing that mistake ever again.

No worries. Note that this is not necessarily without minor downsides... one use case this prevents (and we can carve out to narrowly allow if there's demand) is taking the address of the variable to register it with something that's also initializing it. I doubt that there are important existing APIs that do that, but we'll see if people encounter any...

taking the address of the variable to register it with something that's also initializing it.

I think you can cover that with "you can pass it as a function's out parameter".
Hm, maybe not. The "address of" may not work there, as you'd need to unpack the cpp2::out struct and get the address from there.

I thought about that, because of your example of this style... changing int to X, and adding forward to avoid a copy/move:

func : (out xx : X) -> forward X = {
    xx = 0;           // construct x
    return xx;
}

main :() = {
    x : X = func(x&); // construct x... again, and from itself
}

The principle is that a function with an out parameter is a (delegating) constructor. So at first that seemed to make sense. We're constructing, right?

But then I realized that this would be double construction -- x is being constructed in func, and then again in main (in fact, it's constructed from itself). So I couldn't think of a case where it made sense, and just flagged all uses of x in its own initializer. But we can always relax that if we find a use case.

Using a function with an out parameter as a constructor already works by taking advantage of Cpp2's lazy initialization feature, which naturally is outside the declaration:

func : (out xx : X) = {
    xx = 0;           // construct x
}

main :() = {
    x : X;
    func(x&);         // constructs x
}

I've contacted you offline about the source of this sample.