Embedding / Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

如何加载模型

YiingWei opened this issue · comments

作者你好,当我用下面的代码尝试加载您的中文词向量模型

加载中英文词向量模型

ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True)
结果显示下面报错,应该如何解决呢
Traceback (most recent call last):
File "c:/Users/11323/Desktop/score_comment/socore_comments.py", line 127, in
ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True)
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1719, in
load_word2vec_format
return _load_word2vec_format(
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 2065, in
_load_word2vec_format
_word2vec_read_binary(
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1960, in
_word2vec_read_binary
processed_words, chunk = _add_bytes_to_kv(
File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1939, in
_add_bytes_to_kv
word = chunk[start:i_space].decode(encoding, errors=unicode_errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaf in position 0: invalid start byte

commented

应该binary=False,因为模型是txt格式十进制的,如果是bin采用True

作者你好,当我用下面的代码尝试加载您的中文词向量模型

加载中英文词向量模型

ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True) 结果显示下面报错,应该如何解决呢 Traceback (most recent call last): File "c:/Users/11323/Desktop/score_comment/socore_comments.py", line 127, in ch_model = KeyedVectors.load_word2vec_format('./ch_model/merge_sgns_bigram_char300.txt', binary=True) File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1719, in load_word2vec_format return _load_word2vec_format( File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 2065, in _load_word2vec_format _word2vec_read_binary( File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1960, in _word2vec_read_binary processed_words, chunk = _add_bytes_to_kv( File "C:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\gensim\models\keyedvectors.py", line 1939, in _add_bytes_to_kv word = chunk[start:i_space].decode(encoding, errors=unicode_errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaf in position 0: invalid start byte

merge_sgns_bigram_char300.txt这个文件怎么生成的?可以直接下载吗?