RFC: Integration testing

Question

RFC: Integration testing

daniel-noland opened this issue 3 years ago · comments

I really appreciate this library but there is one thing which concerns me: integration testing.

I'm looking for feedback on the desired approach here (or alternatives if you think of any).

To be clear, I don't really feel that unit testing is especially problematic in this case: bindgen takes care of
validating the bindings and upstream rdma-core is already well tested in its own right. That said, there are a number of really handy convenience methods and data structures which I would like to help test.

I iterated through several options for integration testing here and I have basically concluded "it is going to be hard."
Still, that doesn't mean I shouldn't try.

Options

Vagrant virtual machines + SoftRoCE: workable but a bit heavy.
This will require significant tooling work and is likely to be really problematic if ya want to continue to use a
tool like TravisCI.
Docker container + SoftRoCE: workable, not super portable, challenging for CI.
In the name of experimentation, I cooked up a debian-bullseye container which builds the project, sets up a
SoftRoCE interface, and runs a few integration tests. The problem is that RDMA is, at its heart, a hardware offload.
Even with SoftRoCE you are just emulating a hardware offload. As long as you are using the host's Kernel (as would
be the case for most container technologies) then the container is really only buying you a consistent development
environment; not a consistent deployment environment. For example, if the CI system hasn't loaded the SoftRoCE
module (rxe) then the container won't work anyway. Packaging the SoftRoCE kernel module with the container is
useless due to ABI compatibility problems with differing kernel versions. You might be able to somehow overlay mount
/lib/modules/ from the host and use DKMS to build and load the SoftRoCE module on a per kernel basis but this seems
like a huge pain. At the very least this will be very difficult to make reliable and portable.

In my experience the best way to ensure tests never get written is to make them a huge pain to write and maintain.

Oh, one more problem: RDMA devices are not (by default) network namespaced. Even when I enable RDMA namespacing, I
can't seem to get SoftRoCE devices to move namespaces. Thus my dream of setting up something like mininet for RoCE
doesn't seem viable currently.
Bash script sets up SoftRoCE device, assume developer has taken care of the kernel config: not ideal but I think it
is the best available choice at the moment.

I have already composed a simple bash script which sets up a single SoftRoCE device. This isn't really portable
either and it requires that the developer already have rdma-core installed (it requires iproute2 and the rdma
command). I don't know how CI would react to this configuration.

Thoughts?

Jon Gjengset · Answer 1 · Mon Jun 07 2021 06:29:40 GMT+0800 (China Standard Time)

I like the approach you're taking in #13 if we can make it work! This is definitely a tricky challenge without dedicated testing hardware.

Maksym Planeta · Answer 2 · Sun Aug 22 2021 05:56:38 GMT+0800 (China Standard Time)

Hi, I would really recommend to do testing in a VM, because SoftRoCE is not very stable and can easily result in a kernel panic.