zaninime / sbt-derivation

Nix library for building Scala sbt projects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for project not in root of source directory

tpwrules opened this issue · comments

The recent refactoring kind of broke projects which had a build.sbt not in the root source directory. I had logic to move the .nix folder around but that doesn’t work anymore with the symlinks.

I think the best option is to bring back the “custom sbt” and environment variables to set the various cache directories. This would let sbt be used anywhere and any number of times without having to think about it.

Alternately, we can add an attribute to the derivation that specifies the project root so the dependency archiving logic can find it. But then the user has to care about this.

What is your preference? I can submit a PR for either.

Actually, we can replace the "custom sbt" by defining SBT_OPTS. I'm working on a PR.

projects which had a build.sbt not in the root source directory

The minimum sbt project you could have is a directory with a single file named build.sbt.

If you look in the Rust or Go build systems, or even stdenv, the expectation is that build files are in the source root directory, too.

So I am not sure I understand the problem here. Can you please point me to a minimal example repo where this is a problem?

This would arise if you have a project with an sbt project as a subdirectory, or conceivably if you have one repo with multiple independent sbt projects you want to compile at the same time.

In my use case, we have an sbt project that generates code for the first project. It's not a minimal example, but the use of the old version of sbt-derivation is shown here. The repo it operates on is here. The comments should be pretty clear, please ask for clarification if you need it. Ultimately make -C pythondata_cpu_vexriscv/verilog invokes sbt runMain a bunch of times in that directory to generate a bunch of different Verilog files.

Moving the dependency directories out of the project directory would allow these less common cases to just work. After all, it's how sbt works by default, with the dependency directories in the user's home. The --no-share flag doesn't have any special logic behind it, it just redefines the three non-Coursier cache directories to be project-relative.

A repo with multiple independent sbt projects would still have a root build.sbt, if those projects were meant to work together (ie. are interdependent on each other). In that case sbt-derivation will just work as usual.

In your case, I think the derivation there is a bit over-complicated, you could just redefine the src attribute to point to the sbt project root instead

src = let repo = builtins.fetchGit {
    url = "https://github.com/${pkgMeta.github_user}/${pkgMeta.github_repo}";
    rev = pkgMeta.git_revision;
    submodules = true;
  };
in "${repo}/pythondata_cpu_vexriscv/verilog"

That will save all the cd / pushd machinery, and the moving of the .nix dir

This would throw away all the Python bits which encapsulate the Scala-built Verilog files. The Scala bits in the derivation I linked previously could be split out into another derivation to deal with this, but it's not any less complicated and there is another similar derivation which relies on files outside the directory with build.sbt to drive sbt and so that strategy would break down.

It's probably true that neither I nor the litex people whose work I am trying to package understand sbt and its ecosystem anywhere near as well as we should and aren't really using it properly. Yet it does also seem true in my brief research that Scala projects generally have the build.sbt in the root of the source, so this PR would essentially fix my problems and no one else's.

I can still use a similar strategy with moving the files around with v2 (except that link mode does not work) so the linked PR is not essential, but nevertheless it's not a complex change and does clean up the archival logic as it no longer has to care about there being four separate folders.

It is not about the complexity, but merging your PR breaks another use case for me, which is being able to use the dependencies derivation independently of sbt-derivation. A use case for this are packages that have tests that require network access in order to run.

In the current state, I can just call the project.dependencies.extractor script, followed by a sbt --no-share test.

With your PR I would have to figure out:

  1. Where the dependencies should live, so that I can call the extractor accordingly
  2. The right SBT_OPTS (I can't extract them from the builder, as they are inlined via export ...)
  3. Even if I could extract SBT_OPTS, the options would reference absolute paths created in the context of the sandbox, which are no longer valid after the build itself

I see now how that would break things and was wondering why you had made the extractor its own separate script.

The dependencies in the archive are independent of their extraction location both before and after this PR. It looks like the friction point is that the cache directories sbt --no-share specifies aren't compatible with the ones in my PR, not that they will be extracted to the wrong place, as the script specifying that hasn't changed. What if I didn't change those in my PR, i.e. kept them .sbtboot, .boot, .ivy, and .coursier instead of base, boot, ivy, and coursier-cache, respectively?

This should let you run your extractor script with project/ as the argument instead of . (and pass --no-share to sbt) and everything will work for you how it did before. I could even make project/ the default argument. Or add a project sub-directory to $SBT_BUILD_DEPS_DIR so that you can continue to specify ..

I think you correctly outlined one of the friction points, but also the absolute paths bother me. Looking at nixpkgs there's only very few usages of NIX_BUILD_TOP that correspond to the case that we have in your PR, and seem all related to some package with unusual expectations.

The expectation of sbt projects is to have a build.sbt at their root, and sbt-derivation covers this case without problems. The repo you posted looks like a mix of python and scala together, and the build seems organized in a way which is a bit unconventional (e.g. you could build and package only the verilog files with sbt, then extract them in the source while building the python package).

Now I think there's space in sbt-derivation to support packages with unusual expectations, but the change should be minimal (it should not become harder to read the whole project because of this class of special cases) and I'd appreciate having proper tests for it.

I suggest we look at what it would mean in concrete terms for you to switch to the current master:

  1. What doesn't work and there's no workaround possible?
  2. What doesn't work but there's a workaround?

I would then proceed to do a more proper evaluation with these data points in mind. I don't have the time myself to clone the repo you provided and do the switch myself. It is also lacking instructions for testing, etc. so I won't even attempt.

I chose SBT_DERIVATION_DEPS_DIR=$NIX_BUILD_TOP/.sbt-derivation-deps somewhat arbitrarily, but it could be replaced with something like SBT_DERIVATION_DEPS=$(mktemp -d)/project if you think that better conveys the intent and makes your testing easier. NIX_BUILD_TOP equals TMPDIR inside a build environment.

The repo I pointed you to (pythondata-cpu-vexriscv) is indeed packaged unconventionally and isn't possible to test without understanding and installing the LiteX ecosystem. It's not really meant for human consumption. I didn't intend or expect for you to go through that much effort yourself.

Unfortunately, a related repo (pythondata-cpu-vexriscv_smp) is packaged even more unconventionally. A Python script in that repo imports a Python library packaged in another repo which then invokes sbt a bunch of times to generate the Verilog files. Therefore, the solution of running sbt in its own derivation with build.sbt at the root does not work as the information needed to run sbt for the project is "above" build.sbt.

This problem of running sbt in the non-root directory is workaround-able for me; there's nothing which is impossible to get done that I need to do. The link archival strategy is broken with this workaround though for reasons I haven't bothered to debug. If I have to sacrifice that, that's fine.

However, the workaround is verbose and depends a lot on the current specifics of sbt and sbt-derivation, which might be difficult to keep maintaining. The workaround is illustrated below, eliding optimal dependency generation and runHook, etc.:

mkSbtDerivation rec {
  src = {};

  # directory that contains the build.sbt file, relative to initial src dir
  BUILD_SBT_DIR = "pythondata_cpu_vexriscv_smp/verilog/ext/VexRiscv";

  depsWarmupCommand = ''
    pushd ${BUILD_SBT_DIR}
    sbt compile
    popd

    mkdir -p project
    mv ${BUILD_SBT_DIR}/project/{.sbtboot,.boot,.ivy,.coursier} project/
  '';

  depsSha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
  depsArchivalStrategy = "copy";

  preConfigure = ''
    mkdir -p project
  '';

  buildPhase = ''
    mkdir -p ${BUILD_SBT_DIR}/project
    mv project/{.sbtboot,.boot,.ivy,.coursier} ${BUILD_SBT_DIR}/project

    # run whichever command causes sbt to be run
    ./generate.py
  '';

  installPhase = ''
    rm -rf ${BUILD_SBT_DIR}/project/{.sbtboot,.boot,.ivy,.coursier}

    # install according to package particulars
  '';
}

I think it would be best to modify sbt-derivation so that sbt referenced "global" directories and as such it always "just worked" in the sbt-derivation environment in any directory with any sequence of invocations, like it would in normal operation outside Nix (assuming the user doesn't pass --no-share). It doesn't really matter to me how exactly that is accomplished; I had hoped that my PR doing that similarly to how it used to be done would be a good way but if you think it's significantly harder to read we'll have to come up with something different.

Compare to the derivation theoretically required if the above were true:

mkSbtDerivation rec {
  src = {};

  depsSha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
  depsArchivalStrategy = "copy";

  buildPhase = ''
    # run whichever command causes sbt to be run
    ./generate.py
  '';

  installPhase = ''
    # install according to package particulars
  '';
}

Unfortunately, a related repo (pythondata-cpu-vexriscv_smp) is packaged even more unconventionally. A Python script in that repo imports a Python library packaged in another repo which then invokes sbt a bunch of times to generate the Verilog files. Therefore, the solution of running sbt in its own derivation with build.sbt at the root does not work as the information needed to run sbt for the project is "above" build.sbt.

This is what sbt-derivation should not and will not cover out of the box. This is a deficiency / (very) bad practice in the upstream repo and should be fixed there. If that is not possible, then a workaround in your nix layer is more meaningful than in the generic sbt-derivation that is used by hundreds of other packages that are packaged following decent practices.

[...] so that sbt referenced "global" directories and as such it always "just worked" in the sbt-derivation environment in any directory with any sequence of invocations, like it would in normal operation outside Nix

This on the other side is a valid point that can make sbt-derivation more usable for everyone. I'll think about a solution for it.

This is what sbt-derivation should not and will not cover out of the box. This is a deficiency / (very) bad practice in the upstream repo and should be fixed there. If that is not possible, then a workaround in your nix layer is more meaningful than in the generic sbt-derivation that is used by hundreds of other packages that are packaged following decent practices.

Yes, I agree here. It's a longer term goal to work with upstream to clean up warts like this, but getting stuff stable on the Nix end is an important step on that journey. To be clear, I don't need anything from sbt-derivation here except the ability to customize the sbt invocation, which I already can do satisfactorily with a custom depsWarmupCommand and buildPhase. I brought this up just to explain why the solution of factoring out the use of sbt into its own Nix derivation is not always feasible.

[...] so that sbt referenced "global" directories and as such it always "just worked" in the sbt-derivation environment in any directory with any sequence of invocations, like it would in normal operation outside Nix

This on the other side is a valid point that can make sbt-derivation more usable for everyone. I'll think about a solution for it.

This "global" ability is all I need from you to support the previously mentioned project in a satisfactory manner. If you think it is helpful to everyone that is great news and I look forward to your solution. Thank you for your continued effort and support with this project.

I updated my PR to implement the "global" ability according to some advice and criticism you've provided here. Please consider using it to advise your solution or accepting it to solve the problem.