bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Help me, I'm dying soon,error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error

listwebit opened this issue · comments

used the following installation method, but received an error that has not been resolved for several days:

git clone https://github.com/NVIDIA/apex
cd apex
pip install --global-option="--cpp_ext" --global-option="--cuda_ext" --no-cache -v --disable-pip-version-check . 2>&1 | tee build.log

The environment is as follows:

nvidia-smi:CUDA Version: 10.2
/usr/local/cuda/bin/nvcc -V :Cuda compilation tools, release 10.2, V10.2.89

pip --default-timeout=10000 install torch==1.12.0+cu102 torchvision==0.13.0+cu102 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu102

I also upgraded gcc:
yum install centos-release-scl
yum install devtoolset-7*
#激活对应的devtoolset,所以你可以一次安装多个版本的devtoolset,需要的时候用下面这条命令切换到对应的版本
scl enable devtoolset-8 bash

The error is as follows:

/home/liulei/miniconda3/envs/pretrain/lib/python3.9/site-packages/torch/include/ATen/core/function_schema.h:522:20: note: ‘c10::toString’
inline std::string toString(const FunctionSchema& schema) {
^~~~~~~~
/home/liulei/miniconda3/envs/pretrain/lib/python3.9/site-packages/torch/include/ATen/core/function_schema.h:522:20: note: ‘c10::toString’
/home/liulei/miniconda3/envs/pretrain/lib/python3.9/site-packages/torch/include/ATen/core/function_schema.h:522:20: note: ‘c10::toString’
error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1
error: subprocess-exited-with-error

× Running setup.py install for apex did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /home/liulei/miniconda3/envs/pretrain/bin/python -u -c '
exec(compile('"'"''"'"''"'"'

mport os, sys, tokenize

try:
import setuptools
except ImportError as error:
print(
"ERROR: Can not execute setup.py since setuptools is not available in "
"the build environment.",
file=sys.stderr,
)
sys.exit(1)

file = %r
sys.argv[0] = file

if os.path.exists(file):
filename = file
with tokenize.open(file) as f:
setup_py_code = f.read()
else:
filename = ""
setup_py_code = "from setuptools import setup; setup()"

exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'/home/liulei/liulei2/apex-master/setup.py'"'"',), "", "exec"))' --cpp_ext --cuda_ext install --record /tmp/pip-record-bugw92kn/install-record.txt --single-version-externally-managed --compile --install-headers /home/liulei/miniconda3/envs/pretrain/include/python3.9/apex
cwd: /home/liulei/liulei2/apex-master/
Running setup.py install for apex: finished with status 'error'
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> apex

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.