google / trillian

A transparent, highly scalable and cryptographically verifiable data store.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Split Merkle tree code to a separate repository

pphaneuf opened this issue · comments

Some of the Merkle tree code at the core of Trillian (most notably, the compact range code) has proven extremely useful outside of Trillian, both in Trillian clients and in other separate code working with personalities, but being bundled up with the far larger and more complex Trillian project means that semver compatibility cannot be relied on (despite this Merkle tree code being fairly stable), and test code pulls in many unnecessary dependencies.

Having it in a separate repository would allow some refining/polishing of the API as an initial v0 to organise it (taking advantage of it moving to provide a migration path), and then a stable v1 API that can be reliably depended on for all sorts of tamper-evident applications. This repository should be self-contained, with very few dependencies beyond the Go standard library (if any at all), and stable testing (since it's really all just maths!).

Creating a separate repo isn't necessary, for a start it can be a separate Go module within this repo.

It still leaves the situation with dependencies and semver rather delicate, and binds them together?

If I understand correctly, the version number of a Go module comes from what it's tagged as, right? So, assuming we dutifully followed semver for both modules, if there was a breaking change or an API added on one side, the other side would have to "follow"?

There's also the issue of go test ./... not passing on purpose for this repo, this isn't confidence inspiring.

I'm not 100% sure it's impossible to make 2 modules in the same repo versioned independently. I suspect there is some combination of path/to/module, tags/branches, and "replace" commands in go.mod that can do it (e.g., see how etcd tries to do some magic to have many versions in the same repo).

But even if it's impossible, depending on who you are targeting, this might or might not be a problem. If the goal is allowing "import merkle" without having to fetch all Trillian dependencies, then a separate module solves it to a decent degree. If a sudden switch to v2 happens, maybe it's not a big deal if the client has to upgrade "trillian/merkle" to "trillian/merkle/v2".

Found some issue with a similar problem: golang/go#27056 (comment)

This issue seems to refer to using two modules from the same repo, but tagging versions applies to everything in the repo.

You could also bring in branches, where development for each module would happen on the relevant branch, but that complicates things, especially if the version numbers get close again?

For example, if trillian is at v1.3.12, but you tag trillian/merkle at v1.3.13, you've now also made a v.1.3.13 of trillian, and if you're doing trillian/merkle in a separate branch (because it has different semver restrictions), then the rest of that repo on that branch might be in an arbitrary state that's completely different to what we'd want to be trillian v1.3.13.

This approach might be better for something that moves together, like a server and its client library, for example.

Another benefit is the overall simplicity, where you would be able to see at a glance what is happening with the Merkle tree code, and easily review every commits.

The old code might need to be deleted from this repo.