ocurrent / ocaml-ci

A CI for OCaml projects

Home Page:https://ocaml.ci.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

debian-12 x86_32 workers fail due to old capnproto version

shonfeder opened this issue · comments

An example of a failing build can be seen at
https://ocaml.ci.dev/github/ocurrent/docker-base-images/commit/d91f3800a8897fd11753fd312f5ea11f240363bd/variant/debian-12-4.14_x86_32_opam-2.1/-/refs/pull/274/head

The relvent error is

# *** Uncaught exception ***
# kj/filesystem-disk-unix.c++:305: failed: ::fstat(fd, &stats): Value too large for defined data type

@benmandrew noted that this is the same problem discussed at
mirage/capnp-rpc#273

@mtelvers pointed out that the problem here is simply that package repo for
debian 12 has not updated to use the fixed capnproto version (e.g., 1.0.2)
https://tracker.debian.org/pkg/capnproto .

I'm trying to get that update moved along, but in the meantime I believe we can
fix this by installing the latest release
in our images before running CI on them. Iiuc, #894 would have fixed the problem while waiting for the upstream updates, but it was closed because

it is preferable to not add patched system libraries in the CI

as per #894 (comment)

However, at this point a proper release is out, and in my view it is preferable to special case install of a package rather than to leave builds red, so I will prep a PR following that example but using the latest install.

Thanks to the debian IRC, TIL:

package versions are frozen for every debian stable release, so unless the fix can be backported non-destructively to 0.9.2, bookworm will not get it

So if we want to continue support this build target,our (non-exclusive) options are:

  1. Install the latest version in our CI pipeline
  2. Work towards, and wait for, a backport to 0.9.2

However, we have now been letting this build fail for months, iiuc, and it seems like everything is moving along ok. So maybe the best option is:

  1. Just remove the tests for debian-12 on x86_32.

WDYT, @mtelvers?

These tests only fail for projects which use capnproto as that is the outdated package. Therefore, I do not think we should disable the test as it will work on most projects.

Thanks, @mtelvers! Your comment here and discussion in slack helped me realize I mistook the scope of CIs effected by this error (I thought it was a problem with service infra running, rather than just a problem with the CI for our infra -- didin't look at the context of the failure enough obviously :)).

After discussion with Mark, I think the next place to look would be at the depext package: we can either install a workable version of capnproto (if its permissable to bypass the package manager) put an OS/arch conflict.

I've checked with the opam repo maintainers to be sure, and (Kate) confirmed there is no way to bypass the distro package manager in depext. So I think really our only option is either figuring how to ignore this target in our CI or how to get this dependency installed on the target.

Once deployed, #932 should remove the specious build failures, but the underlying problem persists. So I'm leaving this issue open to note the problem, even tho we don't have a clear way forward.

Not sure why it took me so long to hit on this solution, but for any project where this is an issue, just mark it as unavailable on debian-12 on x86_32 as per ocurrent/ocurrent-deployer@b60799f

I think we wouldn't want this in the capnp package itself, because users can use the lib on that target with no problems so long as they install a more recent version of capnp first.