typelevel / cats

Lightweight, modular, and extensible library for functional programming.

Home Page:https://typelevel.org/cats/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Document version compatibility goals

ceedubs opened this issue · comments

I think that we should document what our goals are with respect to binary/source compatibility as we reach 1.0.

I'd be inclined to propose the following:

  • A change to only the patch version will always be binary compatible. For example 1.0.2 is binary compatible with 1.0.0 and 1.0.1.
  • A change to the minor version number may not be binary or source compatible. For example, 1.1.0 is not necessarily binary compatible with 1.0.0.
  • For incompatible changes introduced in a new version, we will try to make the transition as painless as possible. Before methods or types are removed, they will go through a deprecation cycle. If a change may silently lead to different behavior, we will try to highlight it in the release notes.
  • The cats-kernel module will remain binary-compatible across minor versions. Binary-incompatible changes to this module should only occur alongside a major version bump (which will be few and far-between).
  • We plan to support minor versions for at least ...some amount of time. Personally I'd be inclined to try to support minor versions for the greater of (2 years after their release, 1 year after the release of the next minor version). By support I mean backporting and publishing bugfixes and significant performance improvements.

What do people think about the ones I have? Are there others that we should add?

One thing that I don't really know how to quantify is source-compatibility. In general I want us to strive for source-compatible changes, but as long as they aren't too intrusive, I think it's less of a concern than binary compatibility.

cc @johnynek and @mpilquist who I think are both interested in compatibility.

Just my 2 cents.
Shall we also include in the vision that we should try to maintain binary comp between minor versions also as much as possible? Or maybe be specific about a certain target minor release's binary compatibility goal on a case by case basis?
As such a foundational library, binary comp is really important, I am not sure a guarantee of only patch version binary comp is strong enough. I mean if we only give that guarantee, then either the users are stuck with a minor version or the contributors are (can only release new features (binary compatible) in patch releases.

In my view, version numbers are cheap but compatibility promises are valuable. I would go for something closer to semantic versioning:

After 1.0.0, changes mean:
patch level change: zero API or semantic change, only method local changes for bug fix or performance fix.
minor level change: any API addition that is source and binary compatible (at least binary compatible because we have a tool to check that).
major level change: any change that fails MIMA for the previous version.

Usually, you can avoid breaking binary compatibility by making some slightly different choices and since we hope for wide adoption, each binary break is a huge downstream cost we often don't see. Especially as cats is adopted by other libraries, so that it is often a diamond dependency. I'm a huge advocate of making some compromises to keep binary compatibility.

I think this comes down to goals: my goal is adoption and impact (there are other goals, such as demonstration, teaching, etc...). With adoption means people will want to use N libraries, they will all want slightly different cats, and so most build tools will give them the largest version number. We should work very hard to make that almost always compatible.

@johnynek are there examples of other prominent libraries in the scala ecosystem using that sort of versioning scheme? Out of all of the libraries that I've had to worry about version compatibilities in recent years (scalaz, shapeless, akka, scodec, scala itself), minor versions track binary incompatibilities and major versions track significant changes to the library (rewrites, major overhauls, etc). I don't have any super strong opinions on versioning, but I also don't want to go against the grain and confuse users.

related to #608

There are two popular versioning schemes here:

  • the java/scala style of epoch.major.minor (proposed by @ceedubs in this thread)

  • semantic versioning of major.minor.patch. (proposed by @johnynek )

Both only breaks bin compat at major versions. IMO, it's basically a trade-off between the ability to release epoch or patch, neither is very frequent to us, maybe patch is a little bit more relevant.

Most other scala libraries I know are using the java/scala style. But Akka changed the versioning scheme to semantic versioning since 2.4 possiblly because that like cats, it sits near the bottom of the dependency chain and takes bin compat as a high priority.

I am not sure if one is strictly more suitable to cats than the other. I propose we just hold a vote and move forward with the decision.

commented

semver

When talking of epoch.major.minor, the distinction between epoch and major is arbitrary.

How many changes does it take for us to consider that an epoch happened?
Is 10 functions or classes changed / removed enough? How about 100?

My problem with epoch.major.minor is that it is lying, being preferred because we get too attached to version numbers and because we don't really want to tell users that their software will break. Well, in regards to binary compatibility, whether you have a single change, or 100, it doesn't matter that much, because in both situations the user gets screwed if not careful.

So I vote for "semver". It's awful, but it's the standard way to communicate breakage.

I lean toward @johnynek's proposal because it makes use of the full namespace and is defined in a way that could be automated, at least in principle. It makes it harder to indicate rewrite-scale breakages but these are so infrequent in practice we can indicate them in some other way, like jumping up the next round 100 or something.

Semantic™ or otherwise, version number is a String that eventually gets selected by dependency resolution engines like Ivy and Maven. Their algorithms will not respect whether the graph brings in 0.9.0, 1.x, or 2.x.

A third option you might want to consider is using organization (or groupId in Maven) and the package name for major versioning. For instance:

  • Cats 1.0 uses org.typelevel.cats as the organization, and cats as the package name.
  • Cats 2.0 uses org.typelevel.cats2 and cats2.

This allows cats.Functor to be binary compatible forever (up to scalaVersion), and safely allow Cats 1.x and Cats 2.x to co-exist in a large system.

commented

In my opinion, I think it's best to follow pure semver: major.minor.patch. I think that it should be a standard in the Scala community to use the following connotations for every part of the version:

  • major: bumped up when a release is binary incompatible with the previous one.
  • minor: bumped up when a release is source incompatible with the previous one (breaking source compatibility happens especially in the presence of implicits).
  • patch: bumped up when a release is both binary and source compatible.

I think that this versioning scheme covers all the cases and helps represent the various ways in which developers of a library can break users' code. For everything that rests undefined, semver defaults apply.

@johnynek can you say why binary compatible additions are excluded from your patch level?

As @ceedubs mentioned above, shapeless's scheme is epoch.major.minor where minor level makes binary compatibility guarantees; major level makes hand-wavey design consistency promises and at epoch level all bets are off.

Here is the definition of Minor version Y (x.Y.z | x > 0) in semver
http://semver.org/#spec-item-7, it MUST be incremented one backward comp additions are introduced.

@eed3si9n I vote for using org to differentiate bincompat breaking versions as well. We don't have to decide that for now, it can be delayed until we plan the first bincomp breaking 2.0 release right? What I am suggesting is that for 1.0 we could continue use org.typelevel as org name even if we plan to change org in cats 2.0 right?

What I am suggesting is that for 1.0 we could continue use org.typelevel as org name even if we plan to change org in cats 2.0 right?

The actual organization name is a minor variance (though I think org.typelevel.cats is more consistent), but the important aspect is "document what our goals are with respect to binary/source compatibility as we reach 1.0."

As @johnynek said:

version numbers are cheap but compatibility promises are valuable

With the organization/package rename system, the stability expectation is different from Semantic or 2.major.minor because with it, cats.Functor will never break (At least not intentionally).

I'm very uncomfortable with the idea of using the organization id to distinguish versions ... this will play badly with tools (eg. publishing to Sonatype under a new organization id would need the new organization id to be registered and maintainers granted permissions all over again) and policies (eg. I can imagine that changing an organization id might mean that an artefact would have to go through corporate approval processes again in a way that wouldn't be necessary for a simple version bump).

Oh, and it's also "odd" ... I'm not against doing things differently per se, but I think that this stuff has been around for long enough that we ought to be able to point to a successful precedent.

It (changing org) definitely has a higher one time cost than simply bumming up the major version number. I am still in favor of it because

  1. cats breaking comp should be a very rare event, we need to force the whole eco system to upgrade.
  2. in such a rare event, being able to support coexistence of multiple major versions of cats might be the only way. It is somewhat similar to cross building to multiple scala versions, in which case it is the artifact name that gets differentiated.
  3. the cost isn't forgiddingly high, e.g. publishing to Sonatype under a new org id takes one jira issue and maybe a couple of maintainers to comment on it (not all current cats maintainers have or even want to have publishing rights).

I don't think using org rename system should change stability expectation though. The org rename system only helps if the cats dependency remains internal to a lib, i.e. cats type classes and data structures do not appear in the public API of the lib. But the opposite is probably more common.
In that sense we should still try to avoid breaking comp as much as we can post 1.0. cats.Functor never breaks simply because we never break it, at least per Scala major version.

For new binary breaking features, we can introduce them with a new module, or at a new major scala version. For example, we should be able to introduce 2.13 enabled bin breaking changes in a scala-2.13 specific src folder.

I don't know if organization/groupId versioning has a catchy name, but it's a technique adopted by some Java libraries on bottom layers.

See for example Java Interoperability Policy for Major Version Updates:

  1. Rename the Java package to include the version number.
  2. Include the library name as part the group ID in the Maven coordinates.
  3. Rename the group ID in the Maven coordinates to include the version number.

Edit:

Another related data point I came across is Google's Cloud APIs Versioning.

  • For version 1 (v1) of an API, its major version number should be encoded in the proto package name, for example google.pubsub.v1. The major version may be omitted from the proto package if the package contains stable types and interfaces that don't expect to have breaking changes, for example google.protobuf and google.longrunning.
  • For all versions of the API other than v1, the major version number must be encoded in the proto package name. For example, google.pubsub.v2.

....

For some period of time, different versions of the same API must be able to work at the same time within a single client application. This is to help the client smoothly transition from the older version to the newer version of the API.

One good presentation arguing for breaking backwards compatibility only with a namespace rename is the Spec-ulation keynote by Rich Hickey.

@alexandru Thanks for sharing this. Really enjoyed watching it.

i've just learned of this tool, by the way:

https://github.com/lvc/japi-tracker

looks really interesting. Instead of telling you yes/no compatibility it can give a score (what fraction of methods have incompatible changes). This can be interesting to consider too, since too often when we have incompatibility for a good reason, we go overboard making many incompatible changes which is pain multiplied by the number of adopters. Even when we have incompatibility, we should strive for small incompatibilities.

Here's an example of the output:
twitter/algebird#638 (comment)

Looks like the vast majority voted for semver. There are still some debate over whether to use org name to mark breaking versions. I think we can delay that decision when we reach the breaking point - hopefully that will be a while. If no objections, I am going to submit a PR to document the semver binary compat goals in README.

This might be pedantic, but #1897 only addresses binary compatibility, while the original description of this issue also talks about source compatibility. (And I wasn't able to determine from the discussion what's the current consensus about that.)

Binary compatibility implies source compatibility, but not vice versa.

Source compatibility is less important in the context of Scala, since the compiler can help pinpoint the changes that need to be made and you can also have automatic rewrite rules defined with Scalafix. Migrating a codebase to a new version that breaks source compatibility may be costly, but at least it's a deterministic process, since you've got the compiler and your tests guarding against regressions.

But binary compatibility is where it hurts, since in testing you're not testing transitive dependencies and when the JVM links at runtime to the JARs indicated by the classpath, the compiler is not involved at all.

Note that for dynamic languages (that get distributed as source, instead of compiled binary JARs), binary compatibility == source compatibility.

Binary compatibility implies source compatibility

Not really. Here are a few examples of binary compatible but source incompatible changes.

Just created #1907
I was aware of these source breaking but binary compatible changes (thanks to @dwijnand) when drafting the documentation. But to me those are, in some sense, "accidentally" binary compatible.
@dwijnand does MiMa will catch them, actually, as binary breaking?
I agree that @alexandru source compatibility isn't that much an issue (they are rare and easy to fix) compared to the binary compatibility issue in diamond dependency trees.

does MiMa will catch them, actually, as binary breaking?

To the best of my knowledge, no, it does not.

I'm not sure I agree that these are "accidental". For example the protected[scope] one seems like a nice trick to remove deprecated stuff from the public (source) API while still preserving bincompat. Even more, forbidding the addition of public methods would restrict future evolution of the library very much.

BTW, I agree that in practice source compatibility is not a big concern, so I'd be OK with simply adding a sentence that "We don't really care about source compatibility" (or similar :-). However, I think that the documentation (as it currently stands) is ambiguous (regarding binary/source compat); that's why I raised this question.

(Maybe we should continue this discussion at #1907?)

Agree we should continue the discussion at #1907