StarCoder2-2b pipeline生成时报错'AttributeError: 'tokenizers.Tokenizer' object has no attribute 'get_added_tokens_decoder'
xing-yiren opened this issue · comments
Describe the bug/ 问题描述 (Mandatory / 必填)
A clear and concise description of what the bug is.
StarCoder2-2b pipeline生成时报错'AttributeError: 'tokenizers.Tokenizer' object has no attribute 'get_added_tokens_decoder'
- Hardware Environment(
Ascend
/GPU
/CPU
) / 硬件环境:
Please delete the backend not involved / 请删除不涉及的后端:
/device GPU
-
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) : 2.2.14
-- Python version (e.g., Python 3.7.5) : 3.9.19
-- OS platform and distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu
-- GCC/Compiler version (if compiled from source): -
Excute Mode / 执行模式 (Mandatory / 必填)(
PyNative
/Graph
): PyNative
Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
To Reproduce / 重现步骤 (Mandatory / 必填)
Steps to reproduce the behavior:
from mindnlp.transformers import AutoTokenizer, AutoModelForCausalLM, PreTrainedTokenizer, PreTrainedModel, GenerationConfig
from mindnlp.transformers import Pipeline, pipeline
import mindspore
class GeneratorBase:
def generate(self, query: str, parameters: dict) -> str:
raise NotImplementedError
def __call__(self, query: str, parameters: dict = None) -> str:
return self.generate(query, parameters)
class StarCoder(GeneratorBase):
def __init__(self, pretrained: str):
self.pretrained: str = pretrained
self.pipe: Pipeline = pipeline(
"text-generation", model=pretrained)
self.generation_config = GenerationConfig.from_pretrained(pretrained)
self.generation_config.pad_token_id = self.pipe.tokenizer.eos_token_id
def generate(self, query: str, parameters: dict) -> str:
config: GenerationConfig = GenerationConfig.from_dict({
**self.generation_config.to_dict(),
**parameters
})
json_response: dict = self.pipe(query, generation_config=config)[0]
generated_text: str = json_response['generated_text']
return generated_text
if __name__ == '__main__':
pretrained = 'bigcode/starcoder2-7b'
g = StarCoder(pretrained)
print(g('def fibonacci(n):', {'max_new_tokens': 10}))
Expected behavior / 预期结果 (Mandatory / 必填)
A clear and concise description of what you expected to happen.
代码成功运行无报错
Screenshots/ 日志 / 截图 (Mandatory / 必填)
If applicable, add screenshots to help explain your problem.
Traceback (most recent call last):
File "/home/daiyuxin/xyr/mindnlp-projects/huggingface-vscode-endpoint-server/tests_ms.py", line 22, in test_starcoder
g = StarCoder(pretrained)
File "/home/daiyuxin/xyr/mindnlp-projects/huggingface-vscode-endpoint-server/generators_ms.py", line 38, in __init__
self.pipe: Pipeline = pipeline(
File "/home/daiyuxin/miniconda3/envs/xyr_ms2.2.12/lib/python3.9/site-packages/mindnlp/transformers/pipelines/__init__.py", line 570, in pipeline
tokenizer = AutoTokenizer.from_pretrained(
File "/home/daiyuxin/miniconda3/envs/xyr_ms2.2.12/lib/python3.9/site-packages/mindnlp/transformers/models/auto/tokenization_auto.py", line 775, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/daiyuxin/miniconda3/envs/xyr_ms2.2.12/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_base.py", line 1723, in from_pretrained
return cls._from_pretrained(
File "/home/daiyuxin/miniconda3/envs/xyr_ms2.2.12/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_base.py", line 1942, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/daiyuxin/miniconda3/envs/xyr_ms2.2.12/lib/python3.9/site-packages/mindnlp/transformers/models/gpt2/tokenization_gpt2_fast.py", line 134, in __init__
super().__init__(
File "/home/daiyuxin/miniconda3/envs/xyr_ms2.2.12/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_fast.py", line 154, in __init__
tokens_to_add = [
File "/home/daiyuxin/miniconda3/envs/xyr_ms2.2.12/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_fast.py", line 157, in <listcomp>
if token not in self.added_tokens_decoder
File "/home/daiyuxin/miniconda3/envs/xyr_ms2.2.12/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_fast.py", line 228, in added_tokens_decoder
return self._tokenizer.get_added_tokens_decoder()
AttributeError: 'tokenizers.Tokenizer' object has no attribute 'get_added_tokens_decoder'
Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.
0.3.1可以用这个方式加速
"""python
pipeline("text-generation", model=pretrained, mirror='modelscope')
"""
tokenizers版本太低了,装个新版