mrphys / tensorflow-nufft

Fast, Native Non-Uniform Fast Fourier Transform for TensorFlow

Home Page:https://mrphys.github.io/tensorflow-nufft/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Compiling without cuda?

apt-get-nat opened this issue · comments

I'm working in a daskhub environment without cuda, and when I try to import the library I get the error NotFoundError: libcudart.so.11.0: cannot open shared object file: No such file or directory. I'd rather not have to try to install libcudart on this environment; is there a way to perform the
_nufft_ops = tf.load_op_library(
24 tf.compat.v1.resource_loader.get_path_to_datafile('_nufft_ops.so'))
operation without attempting to import the GPU version of the code, just a CPU implementation?

Hi @apt-get-nat , yes, I am aware of this issue. We can certainly fix it, but I haven't had the chance yet. Ideally we should try to load CUDA but fall back to CPU-only if not found. Core TensorFlow has some kind of mechanism to do that. I'll see if I can find out how it works and whether we can do something similar.

Another approach might be to compile two versions of the library and release a separate CPU-only version, but the former option seems the better one to me. Either way, I'll keep this in mind for the next release.

As a temporary solution, you could try and build from source by removing the relevant parts from the Makefile, or maybe try a hack like this. If you do attempt to compile it without the CUDA bits, I'm happy to try and answer any questions.

@apt-get-nat just adding that you can also consider using Docker. TensorFlow standard images bring CUDA preinstalled so you wouldn't need to install anything on your system (other than Docker itself, of course). TensorFlow NUFFT has been tested with these images and is known to work.

FYI, for a more permanent solution I believe we can fix this by using static linking to the CUDA runtime library. I think this is what
TF addons is doing.

I'll keep this thread updated with any progress.

Thanks! Long term, trying to load cuda and falling back on CPU routines seems like a great solution. For now I'm going to try mucking around in the Makefile and see if I can get it to compile for myself.

I've run into a bit of a strange problem doing this, which I guess I will ask about here even though it's not exactly related to the CUDA issue directly. When I try to build the package via pip from local files instead of pip downloading them, I get a file not found error for tensorflow_nufft/python/ops/_nufft_ops.so when I then try to include the resulting python package.

It's worth noting that this is before any changes to try to remove cuda dependancies; it's just a different behavior building from local files vs the pip repository. I notice that building from local files does not produce a python3.10/site-packages/tensorflow_nufft.libs directory, while building from the pip repository does. Could that be the reason?

edit: I didn't think it would work, but I tried commenting out the rm -f $(TARGET_LIB) line from the Makefile for my own peace of mind. I was right, that didn't work.

The _nufft_ops.so binary is what you need to build. The pip installation includes this precompiled, but the cloned repo doesn't.

The easiest way to build (I'll add this in a guide at some point) is as follows:

  1. Clone the repository from GitHub (do not use pip installed files).

  2. Open the repository folder in VS Code.

  3. Install the Remote-Containers extension if you do not have it already. This also needs you to install Docker.

  4. In VS Code, open the command palette (Ctrl + Shift + P) and start typing/select "Remote-Containers: Reopen in Container". Wait while VS Code downloads and builds the container.

  5. Open a new terminal if not already open and type make lib. This will build the dynamic library you're missing.

  6. You're now ready to try TensorFlow NUFFT and/or make any necessary changes. If you want to be able to import from a directory other than the root directory, you can install in editable mode (pip install -e . from the root repo folder).

Goodness, of course. I don't know why I expected pip to do the make itself as well. Brain was just broken yesterday I guess. Thank you for your patience.

I noticed your makefile has a CUDA boolean; I was hoping that it would have all the cuda lines nested in IF statements so that compiling a cpu version would be relatively straightforward. Unfortunately, the nvcc calls themselves are not conditioned on the CUDA variable.

Do you think it would be feasible for me to compile those files with g++ and the CXXFLAGS instead of the CUFLAGS?

Actually, my first suggestion would be to replace the following line:

LDFLAGS += -lcudart -lnvToolsExt

By:

LDFLAGS += -l:libcudart_static.a

The -lnvToolsExt flag shouldn't be necessary. This is a potential fix I was planning to try, but haven't had the chance to verify it yet. I might have some time tomorrow though.

The idea is to replace the dynamic link to the CUDA runtime library by a static one, so that the application can run in systems without a CUDA installation.

The use of NVCC during compilation shouldn't be a problem (with this method, you do need to have a CUDA installation with NVCC in the system where you do the compilation, but this should not be necessary in the system where you run it). If you use the devcontainer as I suggested before, that takes care of everything you need to compile successfully (and does not install anything to your system except Docker). Note that you do not need a GPU to do this.

Naturally, if you're going to be compiling with this method in the same system where you plan to run it, the whole thing is pointless, as once you've installed Docker you might as well run the existing TF NUFFT package as I also suggested before. But if you're going to be using a different computer, this might be an option.

It should also be possible to do as you suggest and avoid compiling any CUDA files and using NVCC, but I'm not sure it's worth going down that route.

Hi @apt-get-nat, I have some good news and some bad news.

The good news is I think I fixed the issue by linking to the static CUDA libraries.

The bad news is that linking to the static CUDA libraries results in wheels above 100 MB, the size limit on PyPI, so I can't release. Cufft seems to be the main culprit.

My plan is to try and reduce the size of the binaries. Failing that, I'll request a size limit increase to PyPI. Failing that, we'll need to find a different solution.

In the meantime, you can checkout the develop branch and compile as I described earlier.

Hi,

Oh, that is good news! Just again, I really appreciate that this seems to be high on your priority list. I hope that getting it up to PyPi works out. Unfortunately, I can't do the compile on my local machine as described to try it myself because the docker build command that vscode attempts to execute yields the error

NVIDIA/nvidia-docker#5 15.26 W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
NVIDIA/nvidia-docker#5 15.26 E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease' is no longer signed.
------
executor failed running [/bin/sh -c apt-get update &&     apt-get install -y lib cairo2-dev libgirepository1.0-dev libgtk-3-dev]: exit code: 100

I'm familiar with adding public keys for apt-get in linux distros but I have no idea how to go about that for a docker container, or if that's even the problem here.

That should not have happened. The problem was in apt-get update. The error might be related to NVIDIA/nvidia-container-toolkit#257.

Will update here once I have more info.

The GPG error happened because NVIDIA recently updated the signing keys used by apt (more info here). Anyway, this has been fixed in the latest release, so you might want to give the compilation another go! It should be possible to uncomment the following line to avoid CUDA errors:

# LDFLAGS += -lcufft_static_nocallback

You may also need to change the order by moving the following line:

LDFLAGS += $(TF_LDFLAGS)

below this block:

ifeq ($(CUDA), 1)
LDFLAGS += -L$(CUDA_LIBDIR)
# We do not currently link against cuFFT because it increases the size of
# the shared library by 200 MB (and above the 100 MB limit for PyPI). As a
# result TensorFlow NUFFT cannot currently be loaded in environments without
# CUDA. However this issue should be fixed once the following has been
# addressed: https://github.com/mrphys/tensorflow-nufft/issues/24
# LDFLAGS += -lcufft_static_nocallback
LDFLAGS += -lcudart_static -lculibos
endif

As for a more permanent fix, I think it's not a good idea to increase the binary size for 200 MB for something no one's going to use anyway (unused by CPU implementation, already available in CUDA installation for GPU users). I think that'd be a hard sell to the PyPI maintainers. I'm also not a fan of adding a CPU only package, as that means another package to maintain.

Most importantly, I think all of this might become unnecessary once NVIDIA/nvidia-docker#24 has been addressed. Using the stream executor for the FFT, we should be able to remove the explicit cuFFT dependency (as it will be called implicitly by the TensorFlow framework).

That makes sense. The docker environment was able to get a little further before getting stuck again, this time on

docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /tmp/.X11-unix.

It seems like I might have to just wait for the issue 24 fix, and hope I can install that version on the dev machine.

You're almost there. The Docker image was built successfully, but it failed to run the container because the path /tmp/.X11-unix does not exist on your system. Fortunately, you don't need that to compile!

You can simply remove/comment out the following line in devcontainer.json:

"type=bind,source=/tmp/.X11-unix,target=/tmp/.X11-unix"

Hopefully, the container will then be able to start!

Just checking to confirm in that v0.8.0 did work great for me; thank you!

I wasn't expecting that, but great! v8.0 still may not work properly in some configurations (v8.1 should), but glad to know it works for you already!

This has been fully addressed starting with v8.1.0.