eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java

Home Page:https://rdf4j.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use a single repository for the entire rdf4j code base (merging storage, tools, and testsuite back into rdf4j)

abrokenjester opened this issue · comments

Motivation

Back in 2017 (see https://www.eclipse.org/lists/rdf4j-dev/msg00410.html ) we made a decision to split the rdf4j project over multiple repositories. The main motivation for this was that a full build + verification of the project was taking too long, and this encouraged contributors to take shortcuts. The theory was that by splitting the project, we could get verification time down.

However, that expected speed gain has not really materialized. It's true that individual repo builds are quicker, but when we make a change in, for example, the rdf4j repo, we still need to run verification in rdf4j-storage and -tools - and can turn out to still break things.

A further downside is that compliance tests in the rdf4j repo often use code from rdf4j-storage (e.g. a sail impl) or from rdf4j-tools (e.g. to spin up an rdf4j server) - however due to the order of dependencies, those modules are not built yet when rdf4j repo does its verification. While that's no big deal as long as we're on a develop branch, and everything just uses "the latest SNAPSHOT", it seriously messes up things when release time rolls around and we have to set fixed versions: suddenly when rdf4j repo tries to build it fails because its tests can't find rdf4j-sail-memory 3.0.0. I have been attempting to mitigate this by moving tests around in the project, as well as setting fixed (older) versions for these kinds of test dependencies - but it's not ideal.

Finally: the Jenkins pipeline we currently have to build, verify and deploy all of this is just incredibly convoluted. We now have 24(!) separate Jenkins jobs to coordinate all of this, and it's painful to maintain tbh.

Proposed change

We move back to a mono-repo for the entire codebase, which will live in the rdf4j github repository. rdf4j-doc will remain a separate repository, where the project website and documentation are maintained.

To make sure we get decent build and verification times, we will make the following improvements:

  • culling and cleaning in our compliance and integration tests (there are quite a few tests in there that are either very slow, or redundant, or both).
  • better unit testing with mocking and stubbing instead of cramming all our verification into massive compliance/integration test suites that spin up full servers every time.

Tasks

  • merge rdf4j-storage (with tags and history) into rdf4j
  • merge rdf4j-tools (with tags and history) into rdf4j
  • merge rdf4j-testsuites (with tags and history) into rdf4j
  • reconfigure maven to handle the single-repo build
  • reconfigure Jenkins to handle the simplified build pipeline
  • mark old repositories as no longer in use (possibly make them read-only)

(See also discussion at https://www.eclipse.org/lists/rdf4j-dev/msg01147.html )

Scheduled to be done immediately after the 3.0 release.

Merging the testsuite repo including all history is giving me a lot of headaches, due to the many files being moved and deleted when things were first split: it's nearly impossible to reconcile. So I'll instead just manually copy over the the benchmarks (which is the only part of the testsuites repo that is still relevant).

Turns out the idea of skipping the compliance tests unless the -Pcompliance profile was activated has problems. For now I'll just make sure the compliance modules are not actually deployed. We'll look into how to deal with PR verification vs full compliance later.

Jenkins configuration looks to be set well now, after a few tries:

  1. the PR verification job skips integration tests by means of the -DskipITs flag.
  2. verification of the master branch has been configured as an incremental build, so (in theory) it should only build/test those modules that have had changes applied.

On a separate note: note that the develop branch has not yet been set up correctly.

Happy with the setup for now.