apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Home Page:https://mxnet.apache.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to build with CUDA support

adamjstewart opened this issue · comments

Description

I'm trying to build MXNet 1.9.1 from source with CUDA 11.8.0 using the Spack package manager. It seems to be unable to locate CUDA with CMake, even though CMake is installed and other packages (TF, PyTorch) build fine with CUDA support.

Error Message

/usr/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status

Full build log.

To Reproduce

Steps to reproduce

I'm using the Spack package manager, but it should be possible to reproduce this from source given the same environment. Steps for Spack (using a branch where I'm trying to update our MXNet recipe):

$ git clone https://github.com/adamjstewart/spack.git
$ cd spack
$ git checkout packages/mxnet
$ . share/spack/setup-env.sh
$ spack install mxnet +cuda cuda_arch=80  # may need to change arch for your system

What have you tried to solve it?

I've tried setting CMAKE_CUDA_COMPILER and MXNET_CUDA_ARCH. Let me know if there are any other flags I can set to help CMake find CUDA.

Environment

Build failure is in CI, so it's hard to collect this info, but here are some of the things you might be looking for:

Environment Information
----------Python Info----------
Version      : 3.10.10
Compiler     : GCC 7.3.1
Build        : ('main', 'Mar 31 2023 13:44:49')
Arch         : ('64bit', '')
------------Pip Info-----------
No corresponding pip install for current python.
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
Platform     : linux-amazonlinux2
system       : Linux
node         : CI
release      : ?
version      : ?
----------Hardware Info----------
machine      : x86_64_v3
processor    : x86_64
hw.features.allows_security_research: ?
machdep.cpu.brand_string: ?
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/mxnet, DNS: 0.0209 sec, LOAD: 0.5341 sec.
Error open Gluon Tutorial(en): http://gluon.mxnet.io, HTTP Error 404: Not Found, DNS finished in 0.33913588523864746 sec.
Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:997)>, DNS finished in 0.18706297874450684 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0251 sec, LOAD: 0.1275 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0145 sec, LOAD: 0.5218 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.10099601745605469 sec.
----------Environment----------

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

Fixed this by using -L/path/to/cuda/lib/stubs instead of -L/path/to/cude/lib.