dotnet / csharplang

The official repo for the design of the C# programming language

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Open issues: Breaking changes

MadsTorgersen opened this issue · comments

Breaking changes mitigation - open issues

In order to embrace a breaking-changes approach in C#, we need to a) decide on general principles and procedures, and b) decide how to apply these concretely to the "Field access in auto-properties" feature for shipping in C# 13.

In the following are a number of concrete proposals, along with discussion, examples and alternatives. The intent is to drive to complete design decisions on both of those questions.

Running examples

We'll examine the impact of the decision points in the context of the new "field access" feature, as well as some existing features that would have been - and may still be - candidates for breaking changes: var and discards.

Field access

With the field access in auto-properties feature, property accessor bodies will now have access to an implicitly declared parameter called field representing the generated backing field of an auto-property. This is a breaking change, because current property accessors may well use field to refer to something declared outside of the property:

public class Professor
{
    private string field;
    public string Field { get => field; set => field = value.Trim(); }
    ...
    public override string ToString() => $"Professor of {field}";
}

After the language change, the field inside of the get and set accessors would now refer to a generated backing field, whereas the field in the ToString implementation refers to the declared field field.

var

When var was introduced to C# it was potentially breaking, because there might be declared types called var:

class var { ... }

var theVar = getVar();

To avoid this breaking change, however far-fetched, we made a stopgap rule that var would only mean "infer my type" if no type called var was in scope.

(People have since abused this rule to prevent others on their team from using the var feature, by declaring a useless type of that name in the project!)

This is an example of "spooky action at a distance": A declaration far away changes the meaning of code from otherwise locally determined semantics.

Discards

Discards are simple names _ (underscore) and local variable declarations int _ and var _ where the variable introduced is discarded, and cannot be referenced from code again.

Declaration and use of _ as an ordinary local variable was already allowed, so in order to avoid a breaking change we limited discards in two ways:

  1. Only local variable declaration forms that were syntactically new to that version of the language would be discards, and
  2. Simple names would only be discards if there was no non-discard meaning of the name in scope.

These limitations mean that "old fashioned" local variable declaration syntax is poison to the use of discards, and make it confusing to understand when a given _ is a discard and when it isn't:

// Typed discards

//var _ = myInt;
//string _ = myString;
(int _, var _) = (myInt, myString);
int.TryParse(myString, out var _);

// Untyped discards

_ = myInt;
_ = myString;
(_, _) = (myInt, myString);
int.TryParse(myString, out _);

Here all the _s that are not commented out are discards. However, if you uncomment the old-fashioned local variable declarations at the top, all kinds of things break:

// Typed discards

var _ = myInt;       // Not a discard, declared local variable `_`
string _ = myString; // ERROR: Second declaration of `_`
(int _, var _) = (myInt, myString);
int.TryParse(myString, out var _);

// Untyped discards

_ = myInt;                     // Not a discard, assigns to `_`
_ = myString;                  // ERROR: Not the right type
(_, _) = (myInt, myString);    // ERROR: Not the right type
int.TryParse(myString, out _); // Not a discard, assigns to `_`

Fixing this would be a breaking change. There's an interesting range of possible fixes, from the most to the least breaking. If we were starting from scratch we would probably make _ a keyword and not an identifier. That would be the cleanest design, but at this point would also lead to the most breaks in existing code, given how many different things can have names!

At the other end of the spectrum, the user experience would probably still improve significantly if:

  1. All local declarations of _ were discards, and
  2. All uses of _ as a simple name were discards.

This should be enough to alleviate the current confusion and would be a much smaller break. As a bonus, today @_ is only allowed when _ is a variable, not when it's a discard. So existing code could continue to use _ as a local variable name by prefixing it with @.

Deciding on a breaking design

Breaking changes are disruptive! It's useful to have a set of criteria for the C# team to adhere to when considering a significant breaking change, to ensure that it's worthwhile to do and comes with sufficient mitigation. These are the proposed criteria (first suggested here):

  1. Breaking changes should be adopted judiciously and sparingly, and with clear end-user benefits.
  2. Concrete breaks in user code should be expected to occur rarely. Co-opting a specific identifier in a specific syntactic context would be reasonable.
  3. Each concrete break should be easily explainable in diagnostic messages, with a clear location in code.
  4. Each break should have a default fix that is simple, robust, local, fully automatable, and preserves the meaning of the code into the new version.

Verify: Do we agree that this is a reasonable set of criteria?

Below we explore the use of these criteria on each of the example breaking features.

Deciding on field

Here is how we would justify the field breaking change by the criteria:

  1. The field access feature is highly requested, and would bring significant benefit. Non-breaking designs have all proven significantly inferior, using more cumbersome syntax or "spooky action at a distance" mitigations.
  2. Breaks would only occur when the identifier field is used inside property accessor bodies. It will certainly occur ("field" is a common noun with several uses), but not all over the place, and in most code-bases not at all.
  3. It is easy to pinpoint and explain that a given use of "field" will change its meaning in a future release.
  4. The default fix would be to change the simple name "field" to a member access:
    • If it's to a static member of an enclosing type A, change it to A.field.
    • If it's to an instance member of the enclosing type, change it to this.field.
    • If it's to a primary constructor parameter of the enclosing type, first generate an instance field field that is initialized from the parameter, then change the simple name to this.field.

Verify: Do we agree that this justifies the breaking design for field access? Do we agree that the proposed default fix is suitable?

Deciding on var

The var breaking change would be justified as follows:

  1. There are no practical uses of var as a type name today. The accommodation for it only leads to user confusion, as well as to implementation complexity and degraded experiences in compiler and tools.
  2. At this point it would only break deliberate uses of var as a type name to thwart usage. Nowadays that is better achieved with code styles.
  3. It is easy to pinpoint a given use of var that would no longer refer to a type in a future release.
  4. The default fix would be to replace var with @var.

Deciding on _

The _ breaking changes would be justified as follows:

  1. Discards are a very useful feature, but the confusion around when _ is a discard is severely undermining its use.
  2. The most widespread uses of _ as a local variable today are remnants of pre-discard code, to signify that a local variable should be considered a discard! Those cases would actively benefit from becoming "real" discards, and the remaining cases are rare.
  3. It is easy to highlight a _ and point out that it will be a discard in the future. We could avoid most false positives by ignoring variables that aren't subsequently read from.
  4. The default fix is to change _ to @_.

Mitigating a breaking change

Let's say a breaking change is introduced in C# N. The impact of the break on existing code is mitigated as follows: The compiler for C# N, when compiling for language version N-1 or lower, will produce a warning on code that will eventually break (change its meaning) in C# N. The warning will be the diagnostic message identified in criterion 3 above, and will suggest the fix identified in criterion 4.

IDEs, upgrading tools etc should offer the default fix to the impacted code locations, and would most likely offer to "fix all occurrences" across project and solution, so the user can fortify their code against a language version upgrade with a single gesture. In addition, there might be other reasonable fixes to offer (e.g. rename a declaration of something called field, var or _).

Open question: It's possible for someone to still end up going to the new language version without having fortified their code against the breaking change. If they've already started using new language features, it's not necessarily feasible for them to go back down a language version to get the warnings. Should the compiler also provide off-by-default warnings for the post-upgrade situation, allowing folks to locate code that may have already changed behavior? Presumably this would be turned on to "audit" code, and would eventually be turned off again once the developer is satisfied that all occurrences are correct according to the new semantics.

Mitigating a breaking change warning

The new warnings on existing code are themselves breaking behavior. They are tied to installing a new compiler, and hence usually to installing a new version of the .NET SDK. It turns out that there are other ways in which the SDK would like to introduce new diagnostics for existing projects. We're therefore doing work to harmonize our approaches to this.

New warnings may be disruptive to the user. First, we would reduce the number of times this could happen to once a year, allowing such new diagnostics only on major-version upgrades of the SDK.

Second, the compiler would respect a flag to turn off not just a specific breaking change warning (which you can always do) but also all breaking change warnings. This would also be wired in to an SDK-wide flag to silence all such warnings from across the stack.

For some scenarios it would be meaningful to turn all breaking change warnings off permanently: Legacy projects that would never use a new language version could safely ignore them. Build servers could safely assume that the warnings are caught and dealt with elsewhere.

In other scenarios the warning may just come at an inconvenient time, and you want to postpone dealing with it. Perhaps IDEs and other tools can offer a "silence for 30 days" option or something like that.

Special language versions

In most cases the language version is explicitly specified to the compiler, usually by being implied by the target framework (TFM). Thus, when you get a new compiler, it will initially be configured to compile to an older language version than the newest one it supports. This is where the user normally has a chance of acting on the breaking change warnings!

Latest

However, there is also currently a latest language version, which silently picks the newest language version the compiler is capable of. This is unfortunate, because there is never a time when the language version is older than the compiler, and consequently the user won't get any breaking-change warnings!

The proposal is to "retire" latest, in the sense that the use of it will start yielding a warning. If the warning is ignored or suppressed, then people are knowingly signing up to deal with any breaking changes by other means.

Alternative: We could "park" the meaning of latest at C# 12, either with or without a warning to say that latest no longer works as you'd expect. This way, users would still get breaking change warnings, and would quickly be discouraged from using latest.

Preview

The preview language version is like latest, but additionally enables language features that are not yet shipped, for the purposes of exploring and providing feedback. These features may end up taking a different shape, or not ship at all, so any user of preview is already putting themself at risk of breaking changes down the line. In fact, if they used preview with the previous version as well, they likely already used the breaking feature with its new semantics before it was even released!

It therefore seems reasonable that users of preview get no warnings by default.

Conclusion

With a good set of decisions here, we will be set up to handle not just the "field access" breaking change, but also future breaking designs. That said, this is a new avenue for C#, and there is a good chance that there's something we overlooked or misjudged. By shipping a single breaking feature in C# 13, we have an opportunity to learn and adjust, and to improve the process and experience for next time.

Is there a way to generate telemetry from the IDE/compiler when these warnings are detected? Maybe even with some analysis to determine if the case in question fits an expected (and possibly auto-fixable) pattern? I could see a mechanism like that being used to gauge impact but also to test the potential tooling ahead of time. I could imagine that for more complex cases the team might want to wait more than one cycle to actually pull the trigger, or maybe even back off due to unforeseen circumstances.

I am super happy that the team is considering breaking discards. I've always felt that it was one of those awkward warts and that the attempt to retain backwards compatibility ultimately caused more friction than it avoided.

What about lambdas? e.g.:

var lambda = (int _) => Console.WriteLine(42);

@SunnieShine

What about lambdas? e.g.:

IMO, I think the parameter declaration should be legal, but attempting to reference it as a variable in the lambda body should not be:

// fine
var lambda1 = (int _) => Console.WriteLine(42);

// error
var lambda2 = (int _) => Console.WriteLine(_);

@SunnieShine:

What about lambdas? e.g.:

Good catch on the lambdas!

We currently allow:

Func<int, string, string> lambda1 = (_, _) => "Hello";
var lambda2 = (int _, string _) => "Hello";

I.e. when there is more than one _ in the parameter list we do treat them as discards today, which means _ in the body would also be a discard.

The difference would be that a single _ in a lambda parameter list today is still a named variable declaration for back compat reasons. If/when we consider an actual proposal for this, we should add those as places where we would break.

At yesterday's LDM I was reminded that there is one more kind of potential break with field: A local variable declaration of field within a property accessor body:

class C
{
    public int P
    {
        get { var field = "Field".Length; return field; }
    }
}

This would become an error in C# 13 because you are not allowed to redeclare an existing local or parameter. Unlike the other breaks discussed above, I believe this would be a problem even for some of the "non-breaking" alternative designs we've discussed in the past. It's simply a kind of break that we hadn't thought of.

I see a couple of solutions:

  1. Make a special exception from the no-redeclaration rule for field in a property accessor body.
  2. Provide a default fix for this break. The only thing I can think of is a rename of the local, but for this to be fully automatable we'd have to be able to pick a name for you. I guess field1 or field+N for the first N that leads to a variable that isn't declared outside or inside the scope would do. But it's a little less elegant than the default fixes for the other cases.

Thoughts?

There is also a parameter case, a lambda parameter or a local function parameter.

@MadsTorgersen

Can C# be changed to consider value and field as keywords instead of parameters, thus allowing locals to be declared using @ to differentiate them from those keywords? Currently C# does not allow you to declare a local of name value within the setter either, with or without @.

public int Foo {
    get;
    set {
        int @value = 123;
        int @field = 456;
        field = @value + @field;
    }
}

There would be subtleties and complexities with lambdas/local functions. But it's worth exploring.

It feels like it may be a smooth transition from "special variable name" to "contextual keyword", most IDEs already colorize value similar to keywords and it feels like we would want field to be colored in a similar way when it is the "automatic" field of a property.

@AlekseyTs:

There is also a parameter case, a lambda parameter or a local function parameter.

I dont think they are a problem, unless I misunderstand your point. Parameters are allowed to shadow existing locals:

class C
{
    public int P
    {
        set
        {
            Action<int> a = value => { }; // No error
            void F(int value) { };        // No error
            var value = 0;                // CS0136
        }
    }
}

I dont think they are a problem, unless I misunderstand your point. Parameters are allowed to shadow existing locals

I am not saying that there is a problem, I simply do not know details of the current design to judge about that. I am saying that a field identifier can refer to a parameter in today's code. If it is going to keep referring to the parameter, then, perhaps, there is no problem. However, I think it would be good to mention scenario like that explicitly and explain the expectations.

@HaloFour:

Can C# be changed to consider value and field as keywords instead of parameters, thus allowing locals to be declared using @ to differentiate them from those keywords?

That's definitely worth considering. A lot of people are surprised when they discover that value is not a contextual keyword, so maybe we should just make it so?

This would allow code like this:

class C
{
    public int P
    {
        get
        {
            var @field = ""; // Allowed - field is a contextual keyword
            return @field.Length + field; // both @field and field available
        }
    }
}

People who have a local called field today would still need a default fix, but instead of renaming to something arbitrary it could substitute @field for declaration and every use.

In fact, for all the breaking cases outlined above, this would allow the default fix to similarly replace every breaking occurrence of field with @field instead of a member access, which is certainly more elegant.

We'd need to carefully evaluate whether making field a contextual keyword - and perhaps value for consistency - would lead to other breaking changes. I can't think of any, but as already demonstrated that is very far from a guarantee! 😄

We'd need to carefully evaluate whether making field a contextual keyword

Note: from a parsing perspective, we could make it a literal keyword, parsing accessor bodies in a special state (just like we do for async/await parsing). We would just need to be careful in incremental scenarios to ensrue that such state was encoded into the node.

Unfortunately, we literally only have 8 bits that we can use to store this data for, and we're already full-up on bits :'(

Riffing further on this, the context within which value and field would be keywords would be very similar to await (https://github.com/dotnet/csharpstandard/blob/standard-v7/standard/expressions.md#12981-general) in that it is a keyword within a whole (statement/expression) body occurring in a certain syntactic context (for async the context is a function declaration with the async keyword, for field and value it would be property accessors), and excepting nested lambdas/local functions.

@CyrusNajmabadi:

Unfortunately, we literally only have 8 bits that we can use to store this data for, and we're already full-up on bits :'(

Well shucks, there goes that feature design then! ;-)

Looking at this more, I don't think we can achieve all of these three goals at the same time:

  1. field is a contextual keyword in such a way that it can be leveraged for all "default fixes",
  2. value is evolved to a contextual keyword that works exactly the same way as field, and
  3. Existing behavior for value doesn't break.

Here's the counterexample:

class C
{
    private int value;
    private int field;
    public int P
    {
        set
        {
            WriteLine(@value); // References the implicit parameter, not the field
            WriteLine(@field); // We would want this to reference the field
        }
    }
}

I think we could choose to budge on either of the three goals above:

  1. Drop having field be a contextual keyword and stick with the slightly more elaborate default fixes for the breaking changes
  2. Accept subtle differences in behavior between field and value, specifically around what @field and @value means.
  3. Accept a breaking change in the meaning of @value.
  1. Accept a breaking change in the meaning of @value.

That'd be my preference, I can't imagine it's remotely common to find developers using @value to refer to the value parameter in property setter accessors.

That'd be my preference, I can't imagine it's remotely common to find developers using @value to refer to the value parameter in property setter accessors.

I see @ used by accident sometimes. I think maybe IDE plugins try to 'help,' or maybe what happened was that the @ was initially needed for a prior name, then the user did a rename so that it wasn't needed but didn't pay attention and remove the @. I have a feeling it'll happen at a low but noticeable rate.

What about @args in top-level statements?

  1. Accept a breaking change in the meaning of @value.

That'd be my preference, I can't imagine it's remotely common to find developers using @value to refer to the value parameter in property setter accessors.

Yes and also it looks like the cleanest possible state going forward; if we were to break, it should be for a good reason.

Clearly "good" is in the eye of the beholder. Improving the user experience through improved syntax consistency and embracing modern language idioms is clearly a good reason to break in the eyes of many.

What languages like rust - with its clever breaking changes support mechanism - show us is that blocking breaking changes is an impediment to progress. What rust does - with the benefit of massive hindsight - is make breaking changes manageable. And that's what C# needs too: a robust mechanism for managing breaking changes to make those breaking changes desirable rather than to be feared.

Rust is indeed a great example with "editions". If language semantics change in a new edition, a simple cargo fix --edition will patch-up the code to work on the new one. C# has a good "edition" method with language versions; It's just never used for breaking changes. The idea of deprecating/locking LangVersion=Latest at C# 12 would definitely be a good first step (as proposed). In other words, force the user to acknowledge changes by editing their csproj (or props) file; They already need to edit it when changing the TFM to the next .NET version.

Essentially, if LangVersion=12 (or Latest or older), field is not a keyword, just a normal reference. But if LangVersion=13 (or newer), field becomes a proper keyword with all the rules associated with it. Sure, it might annoy people who have a legitimate need to name a variable field (would now need to be @field), but that's no different from how things are right now with other keywords.

My $0.02: If breaking changes through language versions were to be implemented, I'd request that the team provide something similar to cargo fix --edition that would handle the "fix-ups" (dotnet fix?). For example, it would replace usages of field with @field. Ambiguities, such as if someone were already using new-keyword-titled variables (such as @value), would just be skipped and have a warning written to the console.

This is something we need to take our time with. Let me share my opinions:

Make field and value contextual keywords.

@_, @field, and @value should be useable as identifiers.

These are pretty rare cases where these would break code, and most are a single @, rename, or text replace away.

Cases where breaking changes here might include:

  • In certain industries, say, a football game manager might take in a parameter called field somewhere.
  • Someone not naming things well. For example, there is a better name than value for an out parameter 99% of the time.

I won't say none of us name our variables not-so-great things, but really, I almost see this as a way to enforce better naming.

Edit: Added code blocks

Based on the discussion above around making field and value contextual keywords, I wrote up this proposal:

#7964

Thanks everyone for all the insights here! ❤️

Regarding field access, it seems that there must be a non-breaking way to address this, perhaps:

public class Professor
{
    private string field;
    public string Field { get => Field.field; set => Field.field = value.Trim(); }
    ...
    public override string ToString() => $"Professor of {field}";
}

The name Field is certainly not ambiguous and the notation for member access is idiomatic. One could restrict access to <PropertyName>.<FieldName> too so that it can only be used inside said property. If the DOT is frowned upon then perhaps some other variant, but if this was done, the entire discussion about how to handle this breaking change, evaporates.

Or one could even use this:

public class Professor
{
    private string field;
    public string Field { get => Field:Field; set => Field:Field = value.Trim(); }
    ...
    public override string ToString() => $"Professor of {field}";
}

Here the implicit backing field is now always the same name as the property itself, not the special name field and so there's no scope for a breaking change. The colon has limited use today in C# so leveraging it in new ways seems reasonable.

Regarding field access, it seems that there must be a non-breaking way to address this

There are plenty of non-breaking ways to address this, but they all have warts that create inconsistencies with the already existing identifier value that are undesirable. The point of this exercise is to determine how much of a break it would be to design the feature as if it were to be a part of C# 1.0 and whether that is acceptable.

Breaking Change Plan for C# 13:

Key Points:

Criteria: Breaking changes should be used sparingly with clear benefits, and fixes should be simple and preserve code meaning.
Field Access in Auto-properties:
Justification: Highly requested feature with significant benefits.
Impact: Breaks occur in property accessor bodies using "field".
Mitigation: Default fix involves changing "field" to a member access based on context.

Var:
Justification: Accommodation for var as a type name leads to confusion.
Impact: Breaks in deliberate uses of var as a type name.
Mitigation: Default fix replaces var with @var.

Discards:
Justification: Confusion around when _ is a discard undermines its usefulness.
Impact: Breaks in cases where _ is used as a local variable name.
Mitigation: Default fix involves changing _ to @_.

Mitigation Strategies:
Warnings: Introduce warnings on existing code highlighting potential breaks.
Default Fixes: Provide default fixes for identified breaking changes, with tools for automated application.

Special Language Versions:
Retirement of "latest": Discontinue "latest" to prevent users from bypassing breaking change warnings.
Treatment of "preview": Users of preview are already exposed to potential breaking changes, so no additional warnings.
This plan ensures careful consideration of breaking changes while providing clear guidelines for mitigation and managing language versioning.