obi1kenobi / cargo-semver-checks

Scan your Rust crate for semver violations.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tracking issue: Additional checks, both semver and non-semver

obi1kenobi opened this issue · comments

This is a list of all not-yet-implemented checks that would be useful to have. Some of these require new schema and adapter implementations as well, tracked in #241.

In addition to checking for semver violations, there are certain changes that are not breaking and don't even require a minor version, but can still be frustrating in downstream crates without a minor or major version bump. Crates should be able to opt into such warnings on an individual basis.

For example, based on this poll (with small sample size: ~40 respondents), ~40% of users expect that upgrading to a new patch version of a crate should not generate new lints or compiler warnings. The split between expecting a new minor version and a new major version was approximately 3-to-1.

Major version required:

  • exhaustive enum becomes #[non_exhaustive]: #143
  • repr(C) plain struct has fields reordered
  • tuple struct has fields reordered
  • tuple enum variant has fields reordered
  • pub struct pub field removed
  • pub struct constructible with struct literal adds pub field (#233)
  • pub struct constructible with struct literal adds non-pub field, and cannot be constructed with a literal from outside its own crate anymore (#233)
  • pub struct pub field changes type: #148, blocked on #149
  • pub enum variant field changes type: blocked on #149
  • pub enum tuple variant adds field
  • pub enum tuple variant removes field
  • pub enum struct variant adds field: #238
  • pub enum struct variant removes field: #153
  • pub enum variant discriminant removed
  • pub enum variant discriminant changed value
  • removed direct re-export of an enum variant: #291
  • struct with public fields changes to another kind (more details: rust-lang/cargo#10871 (comment))
    • unit struct to plain struct is breaking since the implicit constructor disappears (Rust for Rustaceans, Chapter 3, "Type Modifications", page 51)
    • #242
  • pub fn moved, deleted, or renamed (#22, #23, #24)
  • pub fn changed return type: blocked on #149
  • pub fn added argument
  • pub fn removed argument
  • pub fn changed arguments in a backward-incompatible way
    • This one is hard: it's possible to go from e.g. taking &str to taking S: Into<String> without breaking
    • when it's not breaking, it requires a minor version
  • #190
  • #191
  • pub fn changed ABI, e.g. from extern "C" to extern "C-unwind" or extern "system"
  • pub method moved into a trait
    • even if the trait is pub, it needs to be imported in the scope to have its methods be available
    • Make sure to test for both trait-provided (default impl) methods and explicitly implemented methods for the trait. See the test cases added in #24 for example.
  • repr(C) removed from struct or enum (#25)
  • repr(transparent) removed from struct or enum (#26, #28)
  • repr(u*) and repr(i*) changed/removed from enum (#29, #30)
  • #74
  • type is no longer Sized / Send / Sync / Unpin / UnwindSafe / RefUnwindSafe (auto traits) (#31)
  • type made Copy (appears to be breaking because of rust-lang/rust#100905 )
  • #73
  • non-sealed trait added method
  • #294
  • non-sealed trait added associated type
  • trait newly became sealed
  • trait removed/renamed associated type
  • #232
  • #231
  • #250
  • #368
  • type no longer implements pub trait
  • implementing an existing pub trait for an existing type
    • breaking because the trait's methods or associated types on that type may be ambiguous relative to those from traits implemented for that type in another crate (Rust for Rustaceans, chapter 3, pg. 52, "Trait Implementations")
  • implementing a new pub trait for an existing type, if the new pub trait is in a prelude module that gets imported with a wildcard
    • similar to above, same source (Rust for Rustaceans, chapter 3, pg. 52, "Trait Implementations")
    • normally the trait has to be in scope for its methods to be available, but the wildcard import will bring it in scope here
  • blanket impl added for an existing trait
    • the blanket impl can cause a conflict with a downstream type that also implements the trait on one of its own types, if that type is covered by the blanket impl -- source: Rust for Rustaceans, chapter 2, pg. 30, "Blanket Implementations")
  • blanket impl added over a fundamental type (&T, Box etc.)
    • similar reasoning as above -- source: Rust for Rustaceans, chapter 2, pg. 30, "Fundamental Types"
  • added new implementation of existing trait that does not contain at least one new local type (and that type satisfies the exemption from the orphan rule)
    • source: Rust for Rustaceans, chapter 2, pg 31. "Covered Implementations"
  • upgrading to new major version of dependency while exporting a type that implements a trait from the dependency (new major version -> "it's not the same trait as before"): libp2p/rust-libp2p#3170 (comment)
  • trait is no longer object safe
  • removing a bound on trait impl: #142
  • Auto trait impls for impl Trait in return type.
    • Requires a superset of the required schema additions as pub fn changed return type
  • #338
  • pub type typedef changes the order of generic arguments (regular, lifetime, or const generics) relative to the underlying type
  • pub type typedef adds a new generic parameter
  • pub type typedef removes a generic parameter
  • pub type typedef removes a default value for a generic parameter
  • pub type typedef changes a default value for a generic parameter
  • variance of type lifetime parameters changed

Minor version required:

  • #57
  • #159
  • new pub struct added
  • pub fields added on pub struct
    • don't report this if the entire struct is new
  • new pub enum added
  • new pub enum variant added
    • don't report this if the entire enum is new
  • pub enum variant discriminant added
  • new pub inherent method added
    • don't report this if the entire type is new
  • new pub union added
  • pub type typedef adds a default value for a generic parameter

Project-defined whether major / minor / patch version required (for example, because they are technically breaking but commonly ignored):

  • Raising the Minimum Supported Rust Version (MSRV) for the crate
  • Changing the size of a type

More checks here:

Opt-in warnings:

Opt-in warnings for difficult-to-reverse changes:

  • Removing #[non_exhaustive] from an item
    • Per semver, removing #[non_exhaustive] can be done in a patch release, but adding it back would then require a new major version.
  • Adding an enum variant in a #[non_exhaustive] enum
    • Per semver, adding variants to a non-exhaustive enum can be done in a patch release, but removing them again afterward would require a new major version.
  • Removing the last non-pub field in an exhaustive public struct
    • Structs that are not #[non_exhaustive] and have only public fields can be constructed with a struct literal. Removing the ability to construct a struct with a struct literal is a breaking change and requires a new major version.
  • Making an item importable in more than one way
    • If an item is in a pub mod and is also exported with pub use, it can become importable in multiple ways. This is easy to miss. Removing an import path is breaking, so perhaps we should warn that this is happening. Related to #35.
  • Making a trait object-safe if it previously was not
    • Object safety then becomes part of the API contract, and breaking object safety is semver-major.
  • A 1-ZST (1-byte-aligned zero-sized-type) type no longer being a 1-ZST
    • This is "possibly breaking" and whether it's breaking or not depends on the intent of the type, and can't be determined programmatically. A #[enforce_1zst] attribute could signal that the type should remain a 1-ZST and that deviations from that are breaking.
    • This can break downstream even if the type is #[non_exhaustive], until rust-lang/rust#78586 is resolved and prevents this.
  • Leaking or re-exporting another crate's type in one's own API
    • for example, having a function that returns a value of another crate's API
    • this can cause coupling to the other crate's version, and can be a pain
    • there are legitimate reasons to do this sometimes, but it should be an intentional decision and probably worth flagging in review
  • Making a type Send/Sync/Sized/Unpin or other auto traits, when it previously wasn't.
    • this is possible to do indirectly, e.g. by removing the last field that prevented the type from (auto-)implementing those traits
    • reverting this is a breaking change

Own public types should be Debug:

That's already available in the compiler as #[warn(missing_debug_implementations)], isn't it?

That's already available in the compiler as #[warn(missing_debug_implementations)], isn't it?

Oh, neat, TIL. It appears to be allowed by default and has to be enforced by manually enabling the check. In that case, perhaps the wish-listed query should be checking that #![deny(missing_debug_implementations)] is set instead.

This also gets into a conversation that I think we only had over zulip so good to summarize here.

Especially if we want this in cargo some day, I think we should clearly define the scope.

cargo clippy is meant for linting an API as it exists

cargo semver-checks would be meant for linting changes in an API

  • missing_debug_implementations is an example of something that imo doesn't belong in cargo semver-checks
  • Linting that a lint is enabled is both getting a bit meta and again something that should be out of scope

Misc notes

  • Making it easier to add lints to clippy is a conversation with the clippy folks and they are interested in solving it
  • User-generated lints in either type of tool shipped with rustup would likely be marked as unstable initially. A path to being stable is dependent on how comfortable people are on stabilizing the query language and the data model which is a large surface area
  • In the mean time, there could be room for a linter that handles user defined lints.

One possible way forward would be something like:

  • Extract the data model components (the Trustfall schema and adapter) into a library crate (essentially #67).
  • Make cargo-semver-checks be just a set of semver queries + a binary that wraps that library crate to execute those queries.
  • Make one or more other tools for the other use cases: any queries that don't fit within the current cargo-semver-checks / clippy domains, custom user-specified queries, etc.

That way, we could easily experiment with querying for more things without bloating the scope of cargo-semver-checks and without making the integration into cargo messy.

I think extracting the data model into a library crate is pretty straightforward and I would be happy to do it if that's what we decide is the best path forward.

Thanks, added to the list! If you'd like to try your hand at it, this lint is probably easier than pub fn changed return type since the actual check is less complex, and I'd be happy to mentor.

commented

I think "trait added method" might be a bit more complicated.

The way I see it adding default methods is fine, and even adding non-default methods is fine, so long as the trait cannot be implemented by an external user. This can be the case for example when using a private super trait or blanket impls. Especially sealed traits are a common pattern in Rust.

even adding non-default methods is fine,

I believe trait added methods are a minor compatibility break. The standard library team is running into this problem with moving functions from the extension trait in itertools to Iterator which is causing them to write a new feature to support this due to the pervasiveness of itertools.

Non-defaulted items of any kind in a trait that is implementable outside its crate are semver-major, because any implementers must add the new items: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-item-no-default

Defaulted items in a trait are trickier. They are definitely at least minor, but could be major as well; some such circumstances are described in the semver reference which shows this as a "possibly-breaking" change: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-default-item

I believe trait added methods are a minor compatibility break. The standard library team is running into this problem with moving functions from the extension trait in itertools to Iterator which is causing them to write a new feature to support this due to the pervasiveness of itertools.

I believe this might be due to the introduced ambiguity between the built-in Iterator trait and its itertools analogue, which is captured in the breaking example of the possibly-breaking entry I linked above.

it's possible to go from e.g. taking &str to taking S: Into<String> without breaking

This is not true, changing an argument type from a concrete type to a generic will break calls like the_function(foo.into()), which only works for non-generic functions because the parameter type guides type inference. There are cases where changing a parameter types as well as a return type is non-breaking though:

  • Removing trait bounds on parameter types, e.g. x: impl Foo + Bar to x: impl Foo
  • Adding trait bounds to an existential return type, e.g. -> impl Foo to -> impl Foo + Send

I want to note that while Iterator/Itertools is a good example of the issue, it's a symptom of the wider semver-minor upgrade hazard of adding any new items.

This happens because of how name lookup works in Rust, since Rust allows arbitrary namespace mixins.

  • Adding an item to a trait is name resolution inference breaking, as it could conflict with another trait item where both traits are implemented for the same type.^[Preventable if the trait is sealed and implemented only for types you control.^[If implemented on upstream types, still potentially breaking, if upstream updates; generally we blame downstream if the inference breakage does not happen without downstream, even if it is triggered by updating upstream.]]
  • Adding a (public) item to a struct/enum is name resolution inference breaking, as it could conflict with a trait item implemented for the type, changing the name from referring to the trait associated item to referring to the struct/enum associated item.
  • Adding a (public) item to a module is name resolution inference breaking, as downstream could be glob importing your module's contents and another module's contents which defines the same name, causing the name to be ambiguous between your and the other module.

In other words, in a pedantic mode, semver-checks'd be justified on requiring minor for any new public item. Even weakening generic requirements might cause inference issues, so e.g. --strict-pedantic should probably require a minor bump for any change to the public API's types; IIUC this matches the intent of semver-minor's "new feature" trigger as well, since the API by construction has new API surface.

In practice, API evolution in this manner is necessarily considered perfectly acceptable, and is imho very rarely worth warning for. It's a subjective evaluation of how likely both that a name conflict is possible and that some downstream would have both names in scope simultaneously; in most cases this is reasonably rare because of the convention to avoid glob imports^[If you want a version of the lint which can fire without firing on every API change, consider linting only for new trait associated items reachable through a module called prelude, since that's likely designed for glob importing.], and there's not really a good analytical way to determine the risk of a non-globbed name conflict to provide a lint cutoff better than yes/no.

Iterator is an especially interesting case because it's a language item trait in the prelude. User types don't have this exacerbating factor (being implicitly available everywhere) for this concern.

Adding trait bounds to an existential return type, e.g. -> impl Foo to -> impl Foo + Send

Note that RPIT already "leaks" autotraits (Send/Sync/Unpin), so that isn't actually a return type refinement.

Actually refining the return type is not-inference-breaking, though you still run the risk of being name-resolution-breaking (e.g. refining to a concrete type or even adding a new guaranteed trait could cause a name conflict with newly applicable extension traits).

In practice, API evolution in this manner is necessarily considered perfectly acceptable, and is imho very rarely worth warning for

The fact that there is a lot of nuance to semver and some parts that are contextual is why I feel like #58 is going to be important.

Here's one not listed: adding a generic (type or const) to a function is a breaking change.

That one is very interesting, and I'm not exactly sure what to make of it.

It's definitely breaking, no doubts there. But as I've written before, some Rust breaking changes don't require a major version — the API evolution RFC says so: https://rust-lang.github.io/rfcs/1105-api-evolution.html

Is adding a generic to an already-generic function covered under that exception?

Is adding a generic to a function that previously wasn't generic covered?

Would love to get your thoughts @jhpratt! These kinds of existential questions are things we run into a lot here 😅

Yes, adding a new generic to a function is a major breaking change. It will break in any case where the type parameters are explicitly provided, like if inference didn't work out. This is true for other API items as well with one exception: if a generic is added to a type or trait but is defaulted, then its a minor breaking change. The explicit type parameters are unchanged but there are cases where inference won't work and it will fail to compile and the API Evolution RFC decided to brush that under the rug and ignore it (I've been bitten by it...)

Also, just because the RFC says something is a minor semver breakage, that doesn't mean the user shouldn't be told as that RFC was written specifically for the stdlib and not as guidance for the ecosystem and even the stdlib sometimes goes to extra lengths to avoid minor breaking changes if the impact is large enough. From what I've heard, they are designing a whole new language feature to allow migrating trait methods from itertools to Iterator without breaking people. Granted, #58 will be important for controlling this.