Embedding / Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MemoryError: Unable to allocate 1.47 TiB for an array with shape (635969, 635970) and data type float32

KevinYe553 opened this issue · comments

from gensim.models import KeyedVectors

# model_file = r"fan_word2vec_binary.bin"
model_file = r"D:\code\python\MachineLearning\word2evc\test\ppmi.baidubaike.word" 
#导入模型
model = KeyedVectors.load_word2vec_format(model_file, binary=True)
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_7920\2878082268.py in <module>
      4 model_file = r"D:\code\python\MachineLearning\word2evc\test\ppmi.baidubaike.word"
      5 #导入模型
----> 6 model = KeyedVectors.load_word2vec_format(model_file, binary=True)
D:\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype, no_header)
   1627 
   1628         """
-> 1629         return _load_word2vec_format(
   1630             cls, fname, fvocab=fvocab, binary=binary, encoding=encoding, unicode_errors=unicode_errors,
   1631             limit=limit, datatype=datatype, no_header=no_header,

D:\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in _load_word2vec_format(cls, fname, fvocab, binary, encoding, unicode_errors, limit, datatype, no_header, binary_chunk_size)
   1967         if limit:
   1968             vocab_size = min(vocab_size, limit)
-> 1969         kv = cls(vector_size, vocab_size, dtype=datatype)
   1970 
   1971         if binary:

D:\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in __init__(self, vector_size, count, dtype, mapfile_path)
    241         self.key_to_index = {}
    242 
--> 243         self.vectors = zeros((count, vector_size), dtype=dtype)  # formerly known as syn0
    244         self.norms = None
    245 

MemoryError: Unable to allocate 1.47 TiB for an array with shape (635969, 635970) and data type float32

用错了吗??

内存不够,ppmi是稀疏向量,你加载的这个要1.47TiB=1470GB内存