Ascend / pytorch

Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch

Home Page:https://ascend.github.io/docs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

不支持多线程并发执行,cuda 版本我看是可以的

junior-zsy opened this issue · comments

代码:
rom transformers import AutoTokenizer, AutoModel
import time
import threading

def chat_in_thread(tokenizer, model, i):
start_time = time.time()
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
end_time = time.time()
print(f"Thread {i}: Response - {response}")
print(f"Thread {i}: Execution time - {end_time - start_time} seconds")

加载模型前的开始时间

start_time = time.time()

tokenizer = AutoTokenizer.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", trust_remote_code=True)
model = AutoModel.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", trust_remote_code=True).half().npu()

计算加载模型的时间

model_load_time = time.time() - start_time

创建多个线程执行聊天功能

threads = []
num_threads = 4

for i in range(num_threads):
thread = threading.Thread(target=chat_in_thread, args=(tokenizer, model, i))
threads.append(thread)

启动线程

for thread in threads:
thread.start()

等待所有线程完成

for thread in threads:
thread.join()

print("Model loading time:", model_load_time, "seconds")

报错信息:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/threading.py", line 950, in _bootstrap_inner
self.run()
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/threading.py", line 888, in run
self._target(*self._args, **self._kwargs)
File "/home/jovyan/fast-data/mul_python.py", line 7, in chat_in_thread
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1034, in chat
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 772, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: allocate:/usr1/02/workspace/j_ywhtRpPk/pytorch/torch_npu/csrc/core/npu/NPUCachingAllocator.cpp:1406 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

尝试更新驱动试试

你好,你的问题解决了吗

这个是显卡的一些信息:
$npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 23.0.rc2 Version: 23.0.rc2 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 1 910PremiumA | OK | 84.3 40 0 / 0 |
| 0 | 0000:81:00.0 | 0 1901 / 15137 1 / 32768 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
我尝试使用torch_npu
python
Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 04:25:25)
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch, torch_npu
torch.rand(1).to('npu')*2
Traceback (most recent call last):
File "", line 1, in
RuntimeError: getDevice:/usr1/02/workspace/j_vqN6BFvg/pytorch/torch_npu/csrc/core/npu/impl/NPUGuardImpl.h:42 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

更新到最新的驱动 也是不行的,@sunchuhan-930 这个能确定是否支持呢 ,上面的模型是开源的 chatglm3-6b-32k,代码已经发了,你可以更新驱动试下,看看是否支持

就是调rt接口前,需要调用rtsetdevice
可以看下社区的cann接口说明

新起线程时需要设置该线程使用的npu设备,例如

import torch
import torch_npu
import threading

torch_npu.npu.set_device(0)
def _worker(
    i: int
) -> None:
    torch_npu.npu.set_device(0)
    a = torch.randn(2).npu()
    print(a)

threads = [threading.Thread(target=_worker,
                            args=(i,))
            for i in range(3)]

for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

注:目前npu暂不支持多线程中使用和主线程不同的npu设备,即将在下个版本中支持

@ChuBoning 单卡的话这样是可以的,如果一个模型 加载在 2 张卡上面 ,是需要 torch_npu.npu.set_device("0,1") 使用吗,谢谢

@ChuBoning 单卡的话这样是可以的,如果一个模型 加载在 2 张卡上面 ,是需要 torch_npu.npu.set_device("0,1") 使用吗,谢谢

多卡请使用DDP,例如

model_DDP = torch.nn.parallel.DistributedDataParallel(
            model_DDP, device_ids=npu_subset, gradient_as_bucket_view=bucket_view
        )

@ChuBoning 我现在加载模型是用的 transformers 这个库来加载模型的,因为后面我要用 transformers 库里 封装好的 方法 来完成推理操作。

transformers库的使用大致是这样的:
// device_map 设置成auto 模型 会被 平均加载到多卡上面 ,推理的时候也自动使用多卡推理
model = AutoModelForCausalLM.from_pretrained(
"chatglm3-6b-32k",
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
model.chat(tokenizer, "你是谁开发的", history=[])

如果是单卡 ,我在 model.chat 加 torch_npu.npu.set_device 确实可以,如果是多卡,您说用 torch.nn.parallel.DistributedDataParallel 这种方式,我不清楚 如何跟 transformers 结合,使用 transformers 的推理,还是如果用 torch.nn.parallel.DistributedDataParallel ,我需要自己实现 transformers 类似的推理操作吗 。谢谢

@ChuBoning 能帮我看一下上面的问题吗,谢谢

@ChuBoning 能帮我看一下上面的问题吗,谢谢

@Tyx-main 麻烦看下这个问题

@ChuBoning 我现在加载模型是用的 transformers 这个库来加载模型的,因为后面我要用 transformers 库里 封装好的 方法 来完成推理操作。

transformers库的使用大致是这样的: // device_map 设置成auto 模型 会被 平均加载到多卡上面 ,推理的时候也自动使用多卡推理 model = AutoModelForCausalLM.from_pretrained( "chatglm3-6b-32k", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) model.chat(tokenizer, "你是谁开发的", history=[])

如果是单卡 ,我在 model.chat 加 torch_npu.npu.set_device 确实可以,如果是多卡,您说用 torch.nn.parallel.DistributedDataParallel 这种方式,我不清楚 如何跟 transformers 结合,使用 transformers 的推理,还是如果用 torch.nn.parallel.DistributedDataParallel ,我需要自己实现 transformers 类似的推理操作吗 。谢谢

@Tyx-main 麻烦看下这个问题

@ChuBoning 我现在加载模型是用的 transformers 这个库来加载模型的,因为后面我要用 transformers 库里 封装好的 方法 来完成推理操作。
transformers库的使用大致是这样的: // device_map 设置成auto 模型 会被 平均加载到多卡上面 ,推理的时候也自动使用多卡推理 model = AutoModelForCausalLM.from_pretrained( "chatglm3-6b-32k", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) model.chat(tokenizer, "你是谁开发的", history=[])
如果是单卡 ,我在 model.chat 加 torch_npu.npu.set_device 确实可以,如果是多卡,您说用 torch.nn.parallel.DistributedDataParallel 这种方式,我不清楚 如何跟 transformers 结合,使用 transformers 的推理,还是如果用 torch.nn.parallel.DistributedDataParallel ,我需要自己实现 transformers 类似的推理操作吗 。谢谢

多卡推理目前暂不支持。即将会支持多卡推理 device_map:auto

@Tyx-main 麻烦看下这个问题

@ChuBoning 我现在加载模型是用的 transformers 这个库来加载模型的,因为后面我要用 transformers 库里 封装好的 方法 来完成推理操作。
transformers库的使用大致是这样的: // device_map 设置成auto 模型 会被 平均加载到多卡上面 ,推理的时候也自动使用多卡推理 model = AutoModelForCausalLM.from_pretrained( "chatglm3-6b-32k", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) model.chat(tokenizer, "你是谁开发的", history=[])
如果是单卡 ,我在 model.chat 加 torch_npu.npu.set_device 确实可以,如果是多卡,您说用 torch.nn.parallel.DistributedDataParallel 这种方式,我不清楚 如何跟 transformers 结合,使用 transformers 的推理,还是如果用 torch.nn.parallel.DistributedDataParallel ,我需要自己实现 transformers 类似的推理操作吗 。谢谢

多卡推理目前暂不支持。即将会支持多卡推理 device_map:auto

@junior-zsy

通过修改 accelerate 已经支持 多卡推理,但是 多卡 多线程 还是不支持的 ,单线程是能正常使用了

code:

import time
import threading

import torch
import torch_npu

def chat_in_thread(tokenizer, model, i):
start_time = time.time()
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
end_time = time.time()
print(f"Thread {i}: Response - {response}")
print(f"Thread {i}: Execution time - {end_time - start_time} seconds")

加载模型前的开始时间

start_time = time.time()

tokenizer = AutoTokenizer.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", device_map="auto",trust_remote_code=True)

print(model)
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
print(response)

计算加载模型的时间

model_load_time = time.time() - start_time

创建多个线程执行聊天功能

threads = []
num_threads = 4

for i in range(num_threads):
thread = threading.Thread(target=chat_in_thread, args=(tokenizer, model, i))
threads.append(thread)

启动线程

for thread in threads:
thread.start()

等待所有线程完成

for thread in threads:
thread.join()

print("Model loading time:", model_load_time, "seconds")

error:
我是基于清华大学 KEG 实验室和智谱 AI 公司于 2022 年共同训练的语言模型 GLM3-6B 开发的。我的任务是针对用户的问题和要求提供适当的答复和支持。
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

Exception in thread Thread-5:
Traceback (most recent call last):
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
Exception in thread Thread-3:
Traceback (most recent call last):
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-4:
Traceback (most recent call last):
return func(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
self.run()
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

response, history = model.chat(tokenizer, "你是谁开发的", history=[])

File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

self.data = {k: v.to(device=device) for k, v in self.data.items()}

File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

Model loading time: 18.54421830177307 seconds