smallrye / jandex

Java Annotation Indexer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reconsider reproducible serialization

Ladicek opened this issue · comments

There were attempts to make Jandex reproducible before:

So far, this has been dismissed, but we may want to reconsider. One reason would be to allow incremental builds. If a Maven module carries a Jandex index, rebuilding always changes the JAR, even though the classes perhaps didn't change at all -- the only thing that changed was the index. (Immediate question: would it be possible to avoid rebuilding the index if the classes weren't rebuilt?)

This is not a 3.0 topic, but may be a 3.x topic. This issue is mainly for collecting feedback.

An additional note: in a multi module quarkus app that leverages the jib container extension to create a container images, then the layering that the jib extension set-up gets less effective as the lib layer gets updated for each build

Note to self: this might be more complex than it seems, because Jandex sometimes depends on serializing things in a topological order.

I realized we could easily use hash trees to avoid rebuilding an index if the underlying classes didn't change.

During indexing, we'd compute a hash of each input file (a cryptographically insecure checksum such as CRC32 should be enough) and remember it (ordered map<fully qualified class name, hash>). When the index is being completed, we'd compute a hash of the remembered hashes (in the key order, for reproducibility) and store the resulting hash to the index.

During subsequent indexing, we'd probably just do everything like the usual, but at the very end, we'd compare the currently computed hash with the hash stored in an existing index file. If they are the same, we'd skip writing the index.