用transformers包，下载文件到本地后无法加载AutoTokenizer

Question

用transformers包，下载文件到本地后无法加载AutoTokenizer

PolarisRisingWar opened this issue a year ago · comments

因为网络有些问题，所以选择将文件下载到了本地。pytorch_model.bin还没下，别的已经下了（那个文件夹是之前尝试直接用代码下载时下载的缓存文件），都是从https://huggingface.co/THUDM/glm-10b-chinese/tree/main 下载的文件：

Python代码是：

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("/data/wanghuijuan/pretrained_model/glm-10b-chinese",trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("THUDM/glm-10b-chinese",trust_remote_code=True,cache_dir='/data/wanghuijuan/pretrained_model/glm-10b-chinese')
model = model.half().cuda()
model.eval()

# Inference
inputs = tokenizer("凯旋门位于意大利米兰市古城堡旁。1807年为纪念[MASK]而建，门高25米，顶上矗立两武士青铜古兵车铸像。", return_tensors="pt")
inputs = tokenizer.build_inputs_for_generation(inputs, max_gen_length=512)
inputs = {key: value.cuda() for key, value in inputs.items()}
outputs = model.generate(**inputs, max_length=512, eos_token_id=tokenizer.eop_token_id)
print(tokenizer.decode(outputs[0].tolist()))

现在tokenizer后面的还没写，但是这一步就已经无法实现了。这是报错信息：

Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
  File "/home/wanghuijuan/whj_code1/project_xingzheng/adm_pre_project/codes/amount_prediction/code20230606/glm1.py", line 2, in <module>
    tokenizer = AutoTokenizer.from_pretrained("/data/wanghuijuan/pretrained_model/glm-10b-chinese",trust_remote_code=True)
  File "/data/wanghuijuan/anaconda/anaconda3/envs/envgraph1/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 730, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.glm-10b-chinese.configuration_glm.GLMConfig'> to build an AutoTokenizer.
Model type should be one of AlbertConfig, AlignConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, BloomConfig, BridgeTowerConfig, CamembertConfig, CanineConfig, ChineseCLIPConfig, ClapConfig, CLIPConfig, CLIPSegConfig, CodeGenConfig, ConvBertConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DebertaConfig, DebertaV2Config, DistilBertConfig, DPRConfig, ElectraConfig, ErnieConfig, ErnieMConfig, EsmConfig, FlaubertConfig, FNetConfig, FSMTConfig, FunnelConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GPTSanJapaneseConfig, GroupViTConfig, HubertConfig, IBertConfig, JukeboxConfig, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LlamaConfig, LongformerConfig, LongT5Config, LukeConfig, LxmertConfig, M2M100Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MgpstrConfig, MobileBertConfig, MPNetConfig, MT5Config, MvpConfig, NezhaConfig, NllbMoeConfig, NystromformerConfig, OneFormerConfig, OpenAIGPTConfig, OPTConfig, OwlViTConfig, PegasusConfig, PegasusXConfig, PerceiverConfig, Pix2StructConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, RagConfig, RealmConfig, ReformerConfig, RemBertConfig, RetriBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, Speech2TextConfig, Speech2Text2Config, SpeechT5Config, SplinterConfig, SqueezeBertConfig, SwitchTransformersConfig, T5Config, TapasConfig, TransfoXLConfig, ViltConfig, VisualBertConfig, Wav2Vec2Config, Wav2Vec2ConformerConfig, WhisperConfig, XCLIPConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, YosoConfig.

想请问这个问题怎么解决？