liangwq / Chatglm_lora_multi-gpu

chatglm多gpu用deepspeed和

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

初始化的时候报错

aiaiyueq11 opened this issue · comments

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
FAILED: flatten_unflatten.o
c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
In file included from /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/Device.h:4:0,
from /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
from /usr/local/lib/python3.8/dist-packages/torch/include/torch/extension.h:6,
from /usr/local/lib/python3.8/dist-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp:8:
/usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
#include <Python.h>
^~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "multi_gpu_fintune_belle.py", line 339, in
main()
File "multi_gpu_fintune_belle.py", line 270, in main
model, optimizer, train_dataloader = accelerator.prepare(model, optimizer, train_dataloader)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1090, in prepare
result = self._prepare_deepspeed(*args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1368, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/engine.py", line 1547, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 165, in init
util_ops = UtilsBuilder().load()
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'utils'

请问下这是什么问题