不支持多线程并发执行，cuda 版本我看是可以的

Question

不支持多线程并发执行，cuda 版本我看是可以的

junior-zsy opened this issue 9 months ago · comments

代码：
rom transformers import AutoTokenizer, AutoModel
import time
import threading

def chat_in_thread(tokenizer, model, i):
start_time = time.time()
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
end_time = time.time()
print(f"Thread {i}: Response - {response}")
print(f"Thread {i}: Execution time - {end_time - start_time} seconds")

加载模型前的开始时间

start_time = time.time()

tokenizer = AutoTokenizer.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", trust_remote_code=True)
model = AutoModel.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", trust_remote_code=True).half().npu()

计算加载模型的时间

model_load_time = time.time() - start_time

创建多个线程执行聊天功能

threads = []
num_threads = 4

for i in range(num_threads):
thread = threading.Thread(target=chat_in_thread, args=(tokenizer, model, i))
threads.append(thread)

启动线程

for thread in threads:
thread.start()

等待所有线程完成

for thread in threads:
thread.join()

print("Model loading time:", model_load_time, "seconds")

报错信息：
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/threading.py", line 950, in _bootstrap_inner
self.run()
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/threading.py", line 888, in run
self._target(*self._args, **self._kwargs)
File "/home/jovyan/fast-data/mul_python.py", line 7, in chat_in_thread
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1034, in chat
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/conda/envs/py/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 772, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: allocate:/usr1/02/workspace/j_ywhtRpPk/pytorch/torch_npu/csrc/core/npu/NPUCachingAllocator.cpp:1406 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

sunchuhan-930 · Answer 1 · Mon Nov 20 2023 15:10:49 GMT+0800 (China Standard Time)

尝试更新驱动试试

Coloured-glaze · Answer 2 · Wed Nov 22 2023 22:07:02 GMT+0800 (China Standard Time)

你好，你的问题解决了吗

Herman Huang · Answer 3 · Mon Nov 27 2023 21:49:00 GMT+0800 (China Standard Time)

这个是显卡的一些信息：
$npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 23.0.rc2 Version: 23.0.rc2 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 1 910PremiumA | OK | 84.3 40 0 / 0 |
| 0 | 0000:81:00.0 | 0 1901 / 15137 1 / 32768 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
我尝试使用torch_npu
python
Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 04:25:25)
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch, torch_npu
torch.rand(1).to('npu')*2
Traceback (most recent call last):
File "", line 1, in
RuntimeError: getDevice:/usr1/02/workspace/j_vqN6BFvg/pytorch/torch_npu/csrc/core/npu/impl/NPUGuardImpl.h:42 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

junior-zsy · Answer 4 · Tue Nov 28 2023 10:28:00 GMT+0800 (China Standard Time)

更新到最新的驱动也是不行的，@sunchuhan-930 这个能确定是否支持呢，上面的模型是开源的 chatglm3-6b-32k，代码已经发了，你可以更新驱动试下，看看是否支持

sunchuhan-930 · Answer 5 · Tue Nov 28 2023 14:30:19 GMT+0800 (China Standard Time)

就是调rt接口前，需要调用rtsetdevice
可以看下社区的cann接口说明

ChuBoning · Answer 6 · Wed Nov 29 2023 12:35:44 GMT+0800 (China Standard Time)

新起线程时需要设置该线程使用的npu设备，例如

import torch
import torch_npu
import threading

torch_npu.npu.set_device(0)
def _worker(
    i: int
) -> None:
    torch_npu.npu.set_device(0)
    a = torch.randn(2).npu()
    print(a)

threads = [threading.Thread(target=_worker,
                            args=(i,))
            for i in range(3)]

for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

注：目前npu暂不支持多线程中使用和主线程不同的npu设备,即将在下个版本中支持

junior-zsy · Answer 7 · Thu Nov 30 2023 10:17:53 GMT+0800 (China Standard Time)

@ChuBoning 单卡的话这样是可以的，如果一个模型加载在 2 张卡上面，是需要 torch_npu.npu.set_device("0,1") 使用吗，谢谢

ChuBoning · Answer 8 · Thu Nov 30 2023 14:51:48 GMT+0800 (China Standard Time)

@ChuBoning 单卡的话这样是可以的，如果一个模型加载在 2 张卡上面，是需要 torch_npu.npu.set_device("0,1") 使用吗，谢谢

多卡请使用DDP，例如

model_DDP = torch.nn.parallel.DistributedDataParallel(
            model_DDP, device_ids=npu_subset, gradient_as_bucket_view=bucket_view
        )

junior-zsy · Answer 9 · Thu Nov 30 2023 15:42:27 GMT+0800 (China Standard Time)

@ChuBoning 我现在加载模型是用的 transformers 这个库来加载模型的，因为后面我要用 transformers 库里封装好的方法来完成推理操作。

transformers库的使用大致是这样的:
// device_map 设置成auto 模型会被平均加载到多卡上面，推理的时候也自动使用多卡推理
model = AutoModelForCausalLM.from_pretrained(
"chatglm3-6b-32k",
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
model.chat(tokenizer, "你是谁开发的", history=[])

如果是单卡，我在 model.chat 加 torch_npu.npu.set_device 确实可以，如果是多卡，您说用 torch.nn.parallel.DistributedDataParallel 这种方式，我不清楚如何跟 transformers 结合，使用 transformers 的推理，还是如果用 torch.nn.parallel.DistributedDataParallel ，我需要自己实现 transformers 类似的推理操作吗。谢谢

junior-zsy · Answer 10 · Fri Dec 01 2023 16:17:44 GMT+0800 (China Standard Time)

@ChuBoning 能帮我看一下上面的问题吗，谢谢

junior-zsy · Answer 11 · Wed Dec 06 2023 10:14:55 GMT+0800 (China Standard Time)

@ChuBoning 能帮我看一下上面的问题吗，谢谢

ChuBoning · Answer 12 · Wed Dec 06 2023 11:23:11 GMT+0800 (China Standard Time)

@Tyx-main 麻烦看下这个问题

@ChuBoning 我现在加载模型是用的 transformers 这个库来加载模型的，因为后面我要用 transformers 库里封装好的方法来完成推理操作。

transformers库的使用大致是这样的: // device_map 设置成auto 模型会被平均加载到多卡上面，推理的时候也自动使用多卡推理 model = AutoModelForCausalLM.from_pretrained( "chatglm3-6b-32k", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) model.chat(tokenizer, "你是谁开发的", history=[])

如果是单卡，我在 model.chat 加 torch_npu.npu.set_device 确实可以，如果是多卡，您说用 torch.nn.parallel.DistributedDataParallel 这种方式，我不清楚如何跟 transformers 结合，使用 transformers 的推理，还是如果用 torch.nn.parallel.DistributedDataParallel ，我需要自己实现 transformers 类似的推理操作吗。谢谢

yunxiangtang · Answer 13 · Thu Dec 07 2023 15:46:45 GMT+0800 (China Standard Time)

@Tyx-main 麻烦看下这个问题

@ChuBoning 我现在加载模型是用的 transformers 这个库来加载模型的，因为后面我要用 transformers 库里封装好的方法来完成推理操作。
transformers库的使用大致是这样的: // device_map 设置成auto 模型会被平均加载到多卡上面，推理的时候也自动使用多卡推理 model = AutoModelForCausalLM.from_pretrained( "chatglm3-6b-32k", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) model.chat(tokenizer, "你是谁开发的", history=[])
如果是单卡，我在 model.chat 加 torch_npu.npu.set_device 确实可以，如果是多卡，您说用 torch.nn.parallel.DistributedDataParallel 这种方式，我不清楚如何跟 transformers 结合，使用 transformers 的推理，还是如果用 torch.nn.parallel.DistributedDataParallel ，我需要自己实现 transformers 类似的推理操作吗。谢谢

多卡推理目前暂不支持。即将会支持多卡推理 device_map:auto

yunxiangtang · Answer 14 · Thu Dec 07 2023 15:46:56 GMT+0800 (China Standard Time)

@Tyx-main 麻烦看下这个问题

@ChuBoning 我现在加载模型是用的 transformers 这个库来加载模型的，因为后面我要用 transformers 库里封装好的方法来完成推理操作。
transformers库的使用大致是这样的: // device_map 设置成auto 模型会被平均加载到多卡上面，推理的时候也自动使用多卡推理 model = AutoModelForCausalLM.from_pretrained( "chatglm3-6b-32k", torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) model.chat(tokenizer, "你是谁开发的", history=[])
如果是单卡，我在 model.chat 加 torch_npu.npu.set_device 确实可以，如果是多卡，您说用 torch.nn.parallel.DistributedDataParallel 这种方式，我不清楚如何跟 transformers 结合，使用 transformers 的推理，还是如果用 torch.nn.parallel.DistributedDataParallel ，我需要自己实现 transformers 类似的推理操作吗。谢谢

多卡推理目前暂不支持。即将会支持多卡推理 device_map:auto

@junior-zsy

junior-zsy · Answer 15 · Thu Dec 07 2023 16:30:34 GMT+0800 (China Standard Time)

通过修改 accelerate 已经支持多卡推理，但是多卡多线程还是不支持的，单线程是能正常使用了

code:

import time
import threading

import torch
import torch_npu

def chat_in_thread(tokenizer, model, i):
start_time = time.time()
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
end_time = time.time()
print(f"Thread {i}: Response - {response}")
print(f"Thread {i}: Execution time - {end_time - start_time} seconds")

加载模型前的开始时间

start_time = time.time()

tokenizer = AutoTokenizer.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("/home/jovyan/fast-data/chatglm3-6b-32k", device_map="auto",trust_remote_code=True)

print(model)
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
print(response)

计算加载模型的时间

model_load_time = time.time() - start_time

创建多个线程执行聊天功能

threads = []
num_threads = 4

for i in range(num_threads):
thread = threading.Thread(target=chat_in_thread, args=(tokenizer, model, i))
threads.append(thread)

启动线程

for thread in threads:
thread.start()

等待所有线程完成

for thread in threads:
thread.join()

print("Model loading time:", model_load_time, "seconds")

error:
我是基于清华大学 KEG 实验室和智谱 AI 公司于 2022 年共同训练的语言模型 GLM3-6B 开发的。我的任务是针对用户的问题和要求提供适当的答复和支持。
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

Exception in thread Thread-5:
Traceback (most recent call last):
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
Exception in thread Thread-3:
Traceback (most recent call last):
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-4:
Traceback (most recent call last):
return func(*args, **kwargs)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
self.run()
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/threading.py", line 870, in run
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
response, history = model.chat(tokenizer, "你是谁开发的", history=[])
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
self._target(*self._args, **self._kwargs)
File "server_fb.py", line 10, in chat_in_thread
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

response, history = model.chat(tokenizer, "你是谁开发的", history=[])

File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
return func(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 1035, in chat
self.data = {k: v.to(device=device) for k, v in self.data.items()}
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
inputs = inputs.to(self.device)
File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in to
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

self.data = {k: v.to(device=device) for k, v in self.data.items()}

File "/home/jovyan/fast-data/mini/envs/huawei/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 772, in
self.data = {k: v.to(device=device) for k, v in self.data.items()}
RuntimeError: getDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:41 NPU error, error code is 107002
[Error]: The context is empty.
Check whether acl.rt.set_context or acl.rt.set_device is called.
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

Model loading time: 18.54421830177307 seconds