Dealing with limited breaking changes in C#

Oftentimes new C# language features have edge cases that could conflict with existing features. In the past we’ve avoided breaking existing code even at the expense of making the feature design more complex than otherwise needed. Sometimes the burden is mostly one of implementation complexity borne by the team, but other times aspects of the feature end up being genuinely confusing to developers as a result.

One example is discards. Because _ used to be just an ordinary identifier, the language tries to protect existing usages as such. As a result, _ sometimes represents a discard and sometimes an ordinary identifier, depending on whether or where it is declared. The rules for this have little inherent logic, as they are based purely on "what was there first". This is needlessly confusing to anyone just trying to learn and use discards, and limits the places where they can be usefully employed in code.

What if we changed our stance a little here? C#'s commitment to backwards compatibility is very valuable to its users, but so is simplicity and clarity. Could we adjust our approach so that the existing language can "move over a bit" and give new features room for more consistent and elegant designs?

For this to be a good trade-off, it would require a high-confidence, low-friction mechanism for making existing code resilient to such new features. And it would require a lot of thoughtfulness and restraint on the language design team to only introduce breaking changes with limited scope, strong justification and excellent fixes.

In the following we'll try to outline such a mechanism. We're curious about everyone’s thoughts, both on the specifics of the mechanism, and on whether it represents a balanced approach to considering limited, thoughtful breaking changes in C#.

Example: field access

Field access in auto-properties is a feature we would very much like to add to C#. The feature allows a variable name field to be used in property accessors to refer to the generated backing field:

public string Name { get => field; set => field = value.Trim(); }

However, the above code could already exist, and could for instance be referring to an existing variable called field in an enclosing scope. If that code now changes meaning in e.g. C# 12, this is a breaking change - either it leads to an error or to different behavior of the existing code.

In order to avoid such a break, the current proposal for field introduces the new field keyword conditionally - only if there isn't already another field in scope at the point of the accessor. This "works" in terms of not breaking existing code, but leads to more confusing behavior for the new feature itself. Introducing a field field somewhere outside the property can nullify the language feature and inadvertently change the meaning of occurrences of the field identifier that were intended to refer to the backing field. This behavior is arbitrarily inconsistent with how value works today in property accessors, simply because value was in the language from the start. Finally, the implementation and maintenance costs in the compiler and tooling are likely to be higher because the feature is more complex and the binding behavior less localized.

It would be better if we could just make the lookup of the new field identifier "normal", and consistent with every other identifier in every other place in the language. This is the kind of breaking change that would make the new feature design meaningfully simpler, and leave the language with fewer “gotchas” in the future. Instead of shying away from these kinds of breaking changes entirely, could we give you a mechanism that reliably protects your existing code in the relatively rare cases where the new feature would break it? Could we help you easily find and fix any lines of code that would be vulnerable to the break, before you even adopt the new language version?

Outline of a solution

As a matter of terminology, a breaking change is a change to a newer version of the language that will cause some previously working user code to break. There are two kinds of breaks that can occur that the language concerns itself with:

New error: Code that used to work now causes an error.
Silent break: Code that used to work one way now works differently.

The first is actually the most benign. At least you'll know that something is up, giving you a starting point for chasing down the problem. Still, it is not ideal: The error may not easily indicate that the cause is a breaking change in the language, and it may not even occur at the relevant point in the code.

The second is more insidious. Because the behavior change may be subtle it not be caught in testing, and could lead to bugs down the line where they are more costly. At the same time, it is clear from the field example that we can't just limit breaks to new errors.

A mechanism to fully mitigate these problems should do the following:

Identify existing code that would break when upgrading to a given new language version.
Give diagnostic messages for such code that are expressed directly in terms of the language breaking change.
Offer simple, reliable, fully automated code fixes to render existing code resilient to breaks.
Show up automatically and early, so that you don't accidentally miss out on breaking change mitigation.

Language design considerations

No matter how automatic and reliable the mechanism to mitigate it, any breaking change is going to have non-zero impact on at least some developers. There should be a high bar for language design decisions that lead to breaking changes:

Breaking changes should be adopted judiciously and sparingly, and with clear end-user benefits.
Concrete breaks in user code should be expected to occur rarely. Coopting a specific identifier in a specific syntactic context, as with field, would be reasonable, but it shouldn’t be much more open-ended than that.
Each concrete break should be easily explainable in diagnostic messages, with a clear location in code; e.g.: "In C# 12, within property accessors the field identifier will refer to a generated backing field for the property."
Each break should have a default fix that is simple, robust, local, fully automatable, and preserves the meaning of the code into the new version. For breaking occurrences of the field identifier, the fix would replace them with member access expressions, as in this.field and MyType.field.

If we cannot satisfy these requirements, we shouldn't adopt a given breaking change. We should either settle on an alternative design for the new feature, or punt on the feature completely.

Example: Getting ready for field access

Let’s assume that your current source code is in C# 11 or earlier, and has some occurrences of field as an instance field, which are being referenced as simple names from inside property accessor bodies:

public class Entity
{
    private string field;
    public string Field 
    { 
        get => field; // changes meaning in C# 12
        set => field = value.Trim(); // changes meaning in C# 12
    }

    public Entity(string field)
    {
        this.field = field.Trim();
    }
}

Now let’s say that C# 12 is released, and offers the field access feature with the breaking design. Without mitigation, upgrading your project to C# 12 would break the code above.

The default fix for this breaking change would be to turn field inside property accessors into a member access; i.e. this.field for instance members and MyType.field for static members. This fix would change the above property declaration to:

    public string Field 
    { 
        get => this.field; // resilient to C# 12
        set => this.field = value.Trim(); // resilient to C# 12
    }

Of course you might want to fix the break differently, e.g. by renaming the field variable.

How could we help you find and fix these breaks before you upgrade to C# 12? In the following, let’s sketch a user workflow for that in the context of an IDE and a CLI, using the field feature as a running example. Please note that this is just in order to paint a vision; the actual tooling experience will be designed by the appropriate teams, with the close involvement of the language designers. Afterwards we’ll go into more detail about how we could enable those workflows.

The IDE experience

In the example, remember that you’re starting out in C# 11, and with C# 11 being the latest version supported by the compiler your IDE is currently targeting.

Now a new version of the .NET SDK gets released that includes the C# 12 compiler, and you choose to upgrade Visual Studio or the SDK directly. Based on your project settings (in ways we’ll come back to later), the following now happens:

When you open the project it is still in C# 11, but you now get additional diagnostics (let’s say warnings) on lines of your C# 11 code that would be broken by C# 12. Specifically, whenever field is used in a property accessor, you get a warning along the lines of "In C# 12, field in an accessor will refer to a generated backing field. Consider changing to this.field."
For any given such warning, the IDE offers you a list of code fixes like the following:
- Change field to this.field: This is the default fix described above, and requires no further user input. You get the usual additional options to apply the code fix to all occurrences in the file or project, so if this works for you, you can be done very quickly!
- Rename field: If the field declaration can be edited in this project you might choose to address the issue by renaming it to something else.
- One-click upgrade: Fix every remaining break with its default fix and automatically upgrade to C# 12. This is as easy as it gets; one click and you're safely in C# 12.
- Don't fix, and suppress either this or all such breaking change warnings going forward. This is mostly for when you don't care about newer versions of C#, and want to turn off warnings about them in your projects.
Of course you can also fix some or all of these issues manually, without making use of the suggested code fixes.
If and when there are no more compatibility issues, you are free to upgrade the language version and/or target framework in the usual manner, whenever you choose.

Separately from this, any IDE gesture that offers (directly or indirectly) to update to a new language version should be gated by the presence of breaking change warnings in your current code. It shouldn't be too easy to accidentally upgrade the language version without heeding the warnings.

CLI experience

Based on the same project settings as the IDE, a CLI build command would report the same warnings that are provided in the IDE. When you get these warnings, you can go and fix your code accordingly.

In addition, a CLI command (or .NET Tool delivered with the CLI that acts as though it is a CLI command) could be offered to apply the default fixes to remaining breaking code, whether across the whole project, or other granularities (directory?, solution?) that make sense.

You can also turn the warnings off if you have no interest in guarding against breaking changes right now.

Detecting breaking code

Detection of breaking code would be the responsibility of the compiler. It should be able to produce meaningful diagnostics on code that is compiled with a given language version "x", but will break in a higher language version "y" that the compiler supports.

It’s a long-standing practice that newer compilers continue to support older versions of the language in a way that is compatible with older compilers. Indeed this capability is used frequently, as most projects specify a fixed language version (often indirectly through the target framework) even while often being compiled by a newer compiler.

With this proposal, a newer compiler can add diagnostics to an existing language version. That’s not something we’ve done before! Needless to say, this behavior needs to be optional; something you can turn on or off. Depending on your viewpoint, such diagnostics are either a nuisance or exactly what you’re looking for. If you are not looking to upgrade your project to newer language versions, why should you care whether your code will work correctly in those versions? In such cases you would turn these diagnostics off. Conversely, if you do plan to upgrade the language version (or target framework) from time to time, then why wouldn’t you want newer compilers to tell you that your code is not forward compatible? In fact, since the compiler knows about the new language features (as it supports those newer language versions), it would seem distinctly unhelpful for it not to tell you what it knows about their impact on your code!

The upshot is that a given compiler version would use knowledge about newer language features to produce warnings in older versions of the language about code that would eventually be broken. If you heed those warnings, then your code will not break when you eventually upgrade to that language version.

Fixing breaking code

High quality code fixes representing the default fix of each breaking change will be implemented and offered as gestures in IDEs and commands in the CLI. While the fixes would not be part of the compiler itself, they should probably be implemented by the compiler team alongside the warnings. Their existence is an important part of the protective setup that would allow us to feel confident about new breaking changes, and building all the pieces at the same time by the same people is likely the most reliable approach.

Project settings

Perhaps the most hand-wavy phrase of the above example is “based on your project settings”. It would seem desirable that the breaking change handling adds or changes as little as possible about project settings, compared to what's there today.

The landscape is a bit complicated. For instance, in an SDK-style project, a project file without an explicit <LangVersion> will implicitly feed the compiler the language version associated with the specified <TargetFramework>. That is indeed the most common setup. But elsewhere (e.g., when directly calling the compiler), leaving the language version unspecified will cause the compiler to use the latest language version that it supports, something that can also be explicitly specified with a language version of latest.

The warning behavior proposed here relies crucially on the compiler being invoked with a lower language version than the latest one it supports. Only then do the breaking change warnings manifest! So for the large number of projects where a specific language version is provided to the compiler (even when it’s implicit in the project file) this will work quite well. When a new compiler comes out, it can start producing useful breaking change diagnostics relative to the latest version it now supports.

Opting in or out

Many projects will not want to ever upgrade, and should be able to opt out of warnings that are useless to them.
We suggest that we introduce a new compiler flag to guide version-related behavior, e.g.:

Opting out:

<UpgradeLanguageVersion>never</UpgradeLanguageVersion>

Opting in:

<UpgradeLanguageVersion>warn</UpgradeLanguageVersion>

We need to decide which of the two is the default behavior if nothing is specified. This flag will "clutter" the project files of whomever does not want the default behavior. This speaks to having the default be "off", since that aligns with projects that already do not want to mess with their project files. Those projects could be left as they are today, and would never have to worry about this whole new breaking change thing that they won't care about.

On the other hand, if the default is "off", will too many existing projects that do upgrade from time to time neglect to turn it on, and miss out on the safety provided by the warnings? Is it really so bad for existing projects that don't want to ever upgrade to get a one-time warning with a fix to explicitly say no to upgrade-related warnings going forward? This way every existing project will be prompted to make a choice: Stay on the language version you're on, or fix code that would break in future language versions.

We'll need to make a choice here, and we're not sure which is the right way to go. One thing to remember is that, while the compiler has a default for what happens if nothing is specified, the SDK may have its own defaults for what is sent to the compiler if nothing is specified in the project files. The SDK also decides what is put in the project files when new projects are created in the first place. Finally, there can be a difference between what is the right default long-term vs what causes the least friction for the projects that exist today.

Latest version

For what's currently the language version of latest, whether specified implicitly or explicitly to the compiler, there’s a conundrum. With the current behavior, latest causes the language version to automatically move up when the compiler is upgraded. But that behavior would side-step the very warnings that would make this upgrade safe!

It seems that you can’t have your cake and eat it too. You can't have a new compiler both warn you about what would happen if you upgrade and immediately upgrade before you see the warnings! What to do?

We think latest no longer makes sense as a "language version" in this scheme. We should find a non-jarring way to retire it, and e.g. lock the default language version to C# 11 when you call the compiler without specifying a language version.

Instead we can allow a stronger setting for the opt-in flag introduced above to express the intent to always be on the latest version:

<UpgradeLanguageVersion>latest</UpgradeLanguageVersion>

The additional effect in the compiler would simply be that if the language version is not the latest and there are no breaking change warnings in the source code, then a warning is issued that you aren't on the latest supported version of the language. An IDE can then hang a fix off of that warning to actually upgrade your project's language version (as well as one to shut the warning up if you don't actually want to be on latest after all).

A variant of latest today is preview, which means latest plus any features currently in preview. We should move that over to the same flag, as in:

<UpgradeLanguageVersion>preview</UpgradeLanguageVersion>

That would enable preview features when the project is already on the latest language version, and have the same effect as latest when it's not.

Accumulated breaking changes

If there is more than one version between the language version passed to the compiler and the latest one that it supports, some things could get gnarly. For instance, the same code could be subject to multiple subsequent breaking changes (hopefully very rarely!), and competing default fixes may apply.

Perhaps the fixes – whether in an IDE experience or a CLI batch fixer – should be sequenced according to the language version that introduced the break, so that a later fix is only shown – or applied – when the earlier breaking change has been dealt with.

Perhaps this is too much of a corner case to even worry about. We can learn along the way and adapt – it will not be an issue until the second time a language version introduces breaking changes!

Fixing past decisions

What about the concessions we've already made in the past in the name of back compat? What about discards, var, nameof, etc. Could we fix those in future versions of C#?

We don't see why not. Making those work more consistently and elegantly could be regarded as new "features" with breaking changes. But we'd have to be just as judicious as with "real" new features: Are the breaking changes worth the hassle to developers, and are we adding too many in any given release?

In some cases, "breaking" use of such identifiers is already discouraged, even today. For instance, using var as the name of a real type is a really bad idea anywhere, except in compiler tests! In such instances we might look more lightly at the cost of the break to users, because we'd assume most people are already not writing code that would be affected.

Building the muscle

In summary, we propose to:

Allow ourselves to make limited and well-justified breaking changes in new versions of C#
Provide a mechanism for you to discover when upcoming breaking changes would affect your code, whether in an IDE or CLI context
Provide local, reliable automatic fixes for any such upcoming breaks

If we adopt a scheme like this we should start small - e.g. with one breaking feature (which should probably be field) - and learn from the impact. Over time we can tweak both the warning experience and the evaluation criteria for new breaking features based on what we learn from real world usage, until we've reached a steady state that represents a new and better balance between stability and evolution of C#.

Any thoughts and reactions will be extremely helpful!

LDM Discussions

https://github.com/dotnet/csharplang/blob/main/meetings/2023/LDM-2023-05-15.md#breaking-change-warnings