SpongebBob / Finetune-ChatGLM2-6B

ChatGLM2-6B 全参数微调,支持多轮对话的高效微调。

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

模型训练,input_ids出现None type

Fanshell2333 opened this issue · comments

[INFO|modeling_utils.py:2927] 2023-07-13 06:17:15,679 >> Generation config file not found, using a generation config created from the model config.
input_ids [64790, 64792, 790, 30951, 517, 30910, 30940, 30996, 13, 13, 54761, 31211, 37234, 31211, 50769, 32096, 34009, 38372, 30939, 30940, 32074, 31643, 35220, 31715, 31123, 31654, 50769, 54561, 32585, 31715, 30943, 32154, 31123, 31783, 54572, 54818, 32074, 54942, 32326, 55055, 31514, 13, 13, 55437, 31211, 30910, 36037, 31809, 32615, 31201, 52116, 31201, 36583, 32927, 31639, 31155, 34992, 31662, 40384, 31211, 32615, 57907, 52116, 59086, 31643, 53668, 31868, 31155, 13, 31659, 50769, 32096, 34009, 54942, 30943, 32154, 31123, 31672, 31804, 52116, 54541, 30943, 38807, 31155, 47322, 32096, 34009, 54552, 38372, 30939, 30940, 32074, 31643, 35220, 31715, 31123, 31814, 31804, 38903, 30939, 30940, 32074, 31123, 54996, 30978, 30940, 30940, 56315, 31155, 13, 31672, 50769, 54818, 32074, 39357, 32585, 54541, 30910, 30943, 32154, 1381, 30910, 30978, 30940, 30940, 56315, 542, 30910, 30940, 30930, 30940, 30940, 30966, 30966, 32154, 30967, 56315, 40663, 30910, 30966, 30930, 30966, 55055, 30967, 56315, 31155, 13, 33161, 31211, 50769, 54818, 32074, 54942, 30966, 30930, 30966, 55055, 31155, 13, 13, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
Traceback (most recent call last):
File "main.py", line 376, in
main()
File "main.py", line 207, in main
print_dataset_example(train_dataset[0])
File "main.py", line 186, in print_dataset_example
print("inputs", tokenizer.decode(example["input_ids"]))
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3509, in decode
return self._decode(
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 906, in convert_ids_to_tokens
index = int(index)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

初步排查,tokenizer.eos_token_id为None,有什么解法嘛

改成pad_token_id 就可以了