Tracking issue: Additional checks, both semver and non-semver

Question

Tracking issue: Additional checks, both semver and non-semver

obi1kenobi opened this issue 2 years ago · comments

This is a list of all not-yet-implemented checks that would be useful to have. Some of these require new schema and adapter implementations as well, tracked in #241.

In addition to checking for semver violations, there are certain changes that are not breaking and don't even require a minor version, but can still be frustrating in downstream crates without a minor or major version bump. Crates should be able to opt into such warnings on an individual basis.

For example, based on this poll (with small sample size: ~40 respondents), ~40% of users expect that upgrading to a new patch version of a crate should not generate new lints or compiler warnings. The split between expecting a new minor version and a new major version was approximately 3-to-1.

Major version required:

Minor version required:

Project-defined whether major / minor / patch version required (for example, because they are technically breaking but commonly ignored):

Raising the Minimum Supported Rust Version (MSRV) for the crate
Changing the size of a type
- Example courtesy of cargo-breaking README file

More checks here:

Opt-in warnings:

Depending on a #[non_exhaustive] type from another crate to remain a 1-ZST usable in #[repr(transparent)]
- This is a semver hazard from the user's side. The other crate's type correctly declared that field additions (including sized ones) are non-breaking.
- Related issue: rust-lang/rust#78586
Crate does not enforce some of the recommended allow-by-default lints that are built into Rust and/or clippy: https://doc.rust-lang.org/rustc/lints/listing/allowed-by-default.html
- For example, own public types should be Debug i.e. the Rust missing_debug_implementations lint: https://twitter.com/Lucretiel/status/1558287048892637184
Types with a new() -> Self method should be Default: https://rust-lang.github.io/api-guidelines/interoperability.html
Don't require FusedIterator in generic bounds, instead use Iterator.fuse(): https://doc.rust-lang.org/std/iter/trait.FusedIterator.html

Opt-in warnings for difficult-to-reverse changes:

Removing #[non_exhaustive] from an item
- Per semver, removing #[non_exhaustive] can be done in a patch release, but adding it back would then require a new major version.
Adding an enum variant in a #[non_exhaustive] enum
- Per semver, adding variants to a non-exhaustive enum can be done in a patch release, but removing them again afterward would require a new major version.
Removing the last non-pub field in an exhaustive public struct
- Structs that are not #[non_exhaustive] and have only public fields can be constructed with a struct literal. Removing the ability to construct a struct with a struct literal is a breaking change and requires a new major version.
Making an item importable in more than one way
- If an item is in a pub mod and is also exported with pub use, it can become importable in multiple ways. This is easy to miss. Removing an import path is breaking, so perhaps we should warn that this is happening. Related to #35.
Making a trait object-safe if it previously was not
- Object safety then becomes part of the API contract, and breaking object safety is semver-major.
A 1-ZST (1-byte-aligned zero-sized-type) type no longer being a 1-ZST
- This is "possibly breaking" and whether it's breaking or not depends on the intent of the type, and can't be determined programmatically. A #[enforce_1zst] attribute could signal that the type should remain a 1-ZST and that deviations from that are breaking.
- This can break downstream even if the type is #[non_exhaustive], until rust-lang/rust#78586 is resolved and prevents this.
Leaking or re-exporting another crate's type in one's own API
- for example, having a function that returns a value of another crate's API
- this can cause coupling to the other crate's version, and can be a pain
- there are legitimate reasons to do this sometimes, but it should be an intentional decision and probably worth flagging in review
Making a type Send/Sync/Sized/Unpin or other auto traits, when it previously wasn't.
- this is possible to do indirectly, e.g. by removing the last field that prevented the type from (auto-)implementing those traits
- reverting this is a breaking change

Christopher Durham · Answer 1 · Wed Aug 24 2022 01:53:04 GMT+0800 (China Standard Time)

Own public types should be Debug:

That's already available in the compiler as #[warn(missing_debug_implementations)], isn't it?

Predrag Gruevski · Answer 2 · Wed Aug 24 2022 03:19:35 GMT+0800 (China Standard Time)

That's already available in the compiler as #[warn(missing_debug_implementations)], isn't it?

Oh, neat, TIL. It appears to be allowed by default and has to be enforced by manually enabling the check. In that case, perhaps the wish-listed query should be checking that #![deny(missing_debug_implementations)] is set instead.

Ed Page · Answer 3 · Wed Aug 24 2022 03:45:15 GMT+0800 (China Standard Time)

This also gets into a conversation that I think we only had over zulip so good to summarize here.

Especially if we want this in cargo some day, I think we should clearly define the scope.

cargo clippy is meant for linting an API as it exists

cargo semver-checks would be meant for linting changes in an API

missing_debug_implementations is an example of something that imo doesn't belong in cargo semver-checks
Linting that a lint is enabled is both getting a bit meta and again something that should be out of scope

Misc notes

Making it easier to add lints to clippy is a conversation with the clippy folks and they are interested in solving it
User-generated lints in either type of tool shipped with rustup would likely be marked as unstable initially. A path to being stable is dependent on how comfortable people are on stabilizing the query language and the data model which is a large surface area
In the mean time, there could be room for a linter that handles user defined lints.

Predrag Gruevski · Answer 4 · Wed Aug 24 2022 04:00:26 GMT+0800 (China Standard Time)

One possible way forward would be something like:

Extract the data model components (the Trustfall schema and adapter) into a library crate (essentially #67).
Make cargo-semver-checks be just a set of semver queries + a binary that wraps that library crate to execute those queries.
Make one or more other tools for the other use cases: any queries that don't fit within the current cargo-semver-checks / clippy domains, custom user-specified queries, etc.

That way, we could easily experiment with querying for more things without bloating the scope of cargo-semver-checks and without making the integration into cargo messy.

I think extracting the data model into a library crate is pretty straightforward and I would be happy to do it if that's what we decide is the best path forward.

Alona Enraght-Moony · Answer 5 · Wed Aug 24 2022 20:59:49 GMT+0800 (China Standard Time)

Auto trait impls for impl Trait in return type.
- Requires doing pub fn changed return type

Predrag Gruevski · Answer 6 · Wed Aug 24 2022 23:49:39 GMT+0800 (China Standard Time)

Auto trait impls for impl Trait in return type.

Requires doing pub fn changed return type

Thanks, added to the list! If you'd like to try your hand at it, this lint is probably easier than pub fn changed return type since the actual check is less complex, and I'd be happy to mentor.

oskgo · Answer 7 · Tue Aug 30 2022 19:11:29 GMT+0800 (China Standard Time)

I think "trait added method" might be a bit more complicated.

The way I see it adding default methods is fine, and even adding non-default methods is fine, so long as the trait cannot be implemented by an external user. This can be the case for example when using a private super trait or blanket impls. Especially sealed traits are a common pattern in Rust.

Ed Page · Answer 8 · Tue Aug 30 2022 20:01:27 GMT+0800 (China Standard Time)

even adding non-default methods is fine,

I believe trait added methods are a minor compatibility break. The standard library team is running into this problem with moving functions from the extension trait in itertools to Iterator which is causing them to write a new feature to support this due to the pervasiveness of itertools.

Predrag Gruevski · Answer 9 · Wed Aug 31 2022 01:43:56 GMT+0800 (China Standard Time)

Non-defaulted items of any kind in a trait that is implementable outside its crate are semver-major, because any implementers must add the new items: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-item-no-default

Defaulted items in a trait are trickier. They are definitely at least minor, but could be major as well; some such circumstances are described in the semver reference which shows this as a "possibly-breaking" change: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-default-item

I believe trait added methods are a minor compatibility break. The standard library team is running into this problem with moving functions from the extension trait in itertools to Iterator which is causing them to write a new feature to support this due to the pervasiveness of itertools.

I believe this might be due to the introduced ambiguity between the built-in Iterator trait and its itertools analogue, which is captured in the breaking example of the possibly-breaking entry I linked above.

Jonas Platte · Answer 10 · Wed Aug 31 2022 02:53:15 GMT+0800 (China Standard Time)

it's possible to go from e.g. taking &str to taking S: Into<String> without breaking

This is not true, changing an argument type from a concrete type to a generic will break calls like the_function(foo.into()), which only works for non-generic functions because the parameter type guides type inference. There are cases where changing a parameter types as well as a return type is non-breaking though:

Removing trait bounds on parameter types, e.g. x: impl Foo + Bar to x: impl Foo
Adding trait bounds to an existential return type, e.g. -> impl Foo to -> impl Foo + Send

Christopher Durham · Answer 11 · Wed Aug 31 2022 03:05:28 GMT+0800 (China Standard Time)

I want to note that while Iterator/Itertools is a good example of the issue, it's a symptom of the wider semver-minor upgrade hazard of adding any new items.

This happens because of how name lookup works in Rust, since Rust allows arbitrary namespace mixins.

Adding an item to a trait is name resolution inference breaking, as it could conflict with another trait item where both traits are implemented for the same type.^[Preventable if the trait is sealed and implemented only for types you control.^[If implemented on upstream types, still potentially breaking, if upstream updates; generally we blame downstream if the inference breakage does not happen without downstream, even if it is triggered by updating upstream.]]
Adding a (public) item to a struct/enum is name resolution inference breaking, as it could conflict with a trait item implemented for the type, changing the name from referring to the trait associated item to referring to the struct/enum associated item.
Adding a (public) item to a module is name resolution inference breaking, as downstream could be glob importing your module's contents and another module's contents which defines the same name, causing the name to be ambiguous between your and the other module.

In other words, in a pedantic mode, semver-checks'd be justified on requiring minor for any new public item. Even weakening generic requirements might cause inference issues, so e.g. --strict-pedantic should probably require a minor bump for any change to the public API's types; IIUC this matches the intent of semver-minor's "new feature" trigger as well, since the API by construction has new API surface.

In practice, API evolution in this manner is necessarily considered perfectly acceptable, and is imho very rarely worth warning for. It's a subjective evaluation of how likely both that a name conflict is possible and that some downstream would have both names in scope simultaneously; in most cases this is reasonably rare because of the convention to avoid glob imports^[If you want a version of the lint which can fire without firing on every API change, consider linting only for new trait associated items reachable through a module called prelude, since that's likely designed for glob importing.], and there's not really a good analytical way to determine the risk of a non-globbed name conflict to provide a lint cutoff better than yes/no.

Iterator is an especially interesting case because it's a language item trait in the prelude. User types don't have this exacerbating factor (being implicitly available everywhere) for this concern.

Christopher Durham · Answer 12 · Wed Aug 31 2022 03:12:31 GMT+0800 (China Standard Time)

Adding trait bounds to an existential return type, e.g. -> impl Foo to -> impl Foo + Send

Note that RPIT already "leaks" autotraits (Send/Sync/Unpin), so that isn't actually a return type refinement.

Actually refining the return type is not-inference-breaking, though you still run the risk of being name-resolution-breaking (e.g. refining to a concrete type or even adding a new guaranteed trait could cause a name conflict with newly applicable extension traits).

Ed Page · Answer 13 · Wed Aug 31 2022 03:21:50 GMT+0800 (China Standard Time)

In practice, API evolution in this manner is necessarily considered perfectly acceptable, and is imho very rarely worth warning for

The fact that there is a lot of nuance to semver and some parts that are contextual is why I feel like #58 is going to be important.

Jacob Pratt · Answer 14 · Thu Feb 09 2023 17:38:30 GMT+0800 (China Standard Time)

Here's one not listed: adding a generic (type or const) to a function is a breaking change.

Predrag Gruevski · Answer 15 · Thu Feb 09 2023 23:36:18 GMT+0800 (China Standard Time)

That one is very interesting, and I'm not exactly sure what to make of it.

It's definitely breaking, no doubts there. But as I've written before, some Rust breaking changes don't require a major version — the API evolution RFC says so: https://rust-lang.github.io/rfcs/1105-api-evolution.html

Is adding a generic to an already-generic function covered under that exception?

Is adding a generic to a function that previously wasn't generic covered?

Would love to get your thoughts @jhpratt! These kinds of existential questions are things we run into a lot here 😅

Ed Page · Answer 16 · Fri Feb 10 2023 00:36:37 GMT+0800 (China Standard Time)

Yes, adding a new generic to a function is a major breaking change. It will break in any case where the type parameters are explicitly provided, like if inference didn't work out. This is true for other API items as well with one exception: if a generic is added to a type or trait but is defaulted, then its a minor breaking change. The explicit type parameters are unchanged but there are cases where inference won't work and it will fail to compile and the API Evolution RFC decided to brush that under the rug and ignore it (I've been bitten by it...)

Also, just because the RFC says something is a minor semver breakage, that doesn't mean the user shouldn't be told as that RFC was written specifically for the stdlib and not as guidance for the ecosystem and even the stdlib sometimes goes to extra lengths to avoid minor breaking changes if the impact is large enough. From what I've heard, they are designing a whole new language feature to allow migrating trait methods from itertools to Iterator without breaking people. Granted, #58 will be important for controlling this.