cckuailong / SuperAdapters

Finetune ALL LLMs with ALL Adapeters on ALL Platforms!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

QWen doesn't have eos token

zhangfuwen opened this issue · comments

command is

program crashed, saying pad token cannot be set to None.

hacked the code:

        print("add_eos_token " , self.add_eos_token)
        tokenizer = AutoTokenizer.from_pretrained(
            self.base_model,
            trust_remote_code=True,
            add_eos_token=True
        )  # default add_eos_token=False

        print("tokenizer ", tokenizer)
        print("tokenizer.special_tokens ", tokenizer.special_tokens)
        # Some Models like Qwen do not have pad_token
        if tokenizer.pad_token is None:
            tokenizer.add_special_tokens({'pad_token': tokenizer.eos_token}) # this is why pad token is set to None

add_eos_token False
tokenizer QWenTokenizer(name_or_path='Qwen/Qwen-7b-chat', vocab_size=151851, model_max_length=8192, is_fast=False, padding_side='right', truncation_side='right', special_tokens={}, clean_up_tokenization_spaces=True)
tokenizer.special_tokens {'<|endoftext|>': 151643, '<|im_start|>': 151644, '<|im_end|>': 151645, '<|extra_0|>': 151646, '<|extra_1|>': 151647, '<|extra_2|>': 151648, '<|extra_3|>': 151649, '<|extra_4|>': 151650, '<|extra_5|>': 151651, '<|extra_6|>': 151652, '<|extra_7|>': 151653, '<|extra_8|>': 151654, '<|extra_9|>': 151655, '<|extra_10|>': 151656, '<|extra_11|>': 151657, '<|extra_12|>': 151658, '<|extra_13|>': 151659, '<|extra_14|>': 151660, '<|extra_15|>': 151661, '<|extra_16|>': 151662, '<|extra_17|>': 151663, '<|extra_18|>': 151664, '<|extra_19|>': 151665, '<|extra_20|>': 151666, '<|extra_21|>': 151667, '<|extra_22|>': 151668, '<|extra_23|>': 151669, '<|extra_24|>': 151670, '<|extra_25|>': 151671, '<|extra_26|>': 151672, '<|extra_27|>': 151673, '<|extra_28|>': 151674, '<|extra_29|>': 151675, '<|extra_30|>': 151676, '<|extra_31|>': 151677, '<|extra_32|>': 151678, '<|extra_33|>': 151679, '<|extra_34|>': 151680, '<|extra_35|>': 151681, '<|extra_36|>': 151682, '<|extra_37|>': 151683, '<|extra_38|>': 151684, '<|extra_39|>': 151685, '<|extra_40|>': 151686, '<|extra_41|>': 151687, '<|extra_42|>': 151688, '<|extra_43|>': 151689, '<|extra_44|>': 151690, '<|extra_45|>': 151691, '<|extra_46|>': 151692, '<|extra_47|>': 151693, '<|extra_48|>': 151694, '<|extra_49|>': 151695, '<|extra_50|>': 151696, '<|extra_51|>': 151697, '<|extra_52|>': 151698, '<|extra_53|>': 151699, '<|extra_54|>': 151700, '<|extra_55|>': 151701, '<|extra_56|>': 151702, '<|extra_57|>': 151703, '<|extra_58|>': 151704, '<|extra_59|>': 151705, '<|extra_60|>': 151706, '<|extra_61|>': 151707, '<|extra_62|>': 151708, '<|extra_63|>': 151709, '<|extra_64|>': 151710, '<|extra_65|>': 151711, '<|extra_66|>': 151712, '<|extra_67|>': 151713, '<|extra_68|>': 151714, '<|extra_69|>': 151715, '<|extra_70|>': 151716, '<|extra_71|>': 151717, '<|extra_72|>': 151718, '<|extra_73|>': 151719, '<|extra_74|>': 151720, '<|extra_75|>': 151721, '<|extra_76|>': 151722, '<|extra_77|>': 151723, '<|extra_78|>': 151724, '<|extra_79|>': 151725, '<|extra_80|>': 151726, '<|extra_81|>': 151727, '<|extra_82|>': 151728, '<|extr

Oh, thanks. QWen's pad_token_id、bos_token_id、eos_token_id is None. I'll fix it soon