deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Repository from Github https://github.comdeepseek-ai/DeepSeek-V2Repository from Github https://github.comdeepseek-ai/DeepSeek-V2

Question about the design of bos and eos token

jojo23333 opened this issue · comments

Hi, Thanks for the great work. I'm just in general curious about whether there is a reason to use the Chinese version of '|' and '▁'instead of the '|' , ‘_’ which is standard ASCII characters in eos_token and bos_token. ('<|end▁of▁sentence|>' and '<|begin▁of▁sentence|>' ). Is this for distinguishing deep seek model from English only LLM's like Llamma?

image