a pickle file problem
opened this issue · comments
Hi, @yoonkim
I am a beginner of natural language processing and machine learning. Since 'GoogleNews-vectors-negative300.bin' file size is quite large, all of my attemps for making a pickle file ('mr.p') failed. Could you give me some pieces of advice for making 'mr.p' with 16GB~32GB RAM if you don't mind?
And.. I wonder if 'mr.p' also need a chunk process to solve the memory problem. (I little know about pickle file..)
Thank you
Hi, @soohyunee did you solve this question? I am confusing this problem right now. Do you have any idea?
Hi, @GaoZhongqin
I didn't solve the 'mr.p' related problems, but this Kaggle kernel helped me to make an embedding layer without troubles.
https://www.kaggle.com/ia1na09/cnn-keras-pretrained-word2vec-yoon-kim-model
If you don't need 'mr.p' file, I suggest you the way of this kernel. I hope the kernel helps you as well :)
Thank you
Hello all,
If you are attempting to do this under python 3 and are having memory limitation problems, then your issue likely lies within the string processing. Python 2 and Python 3 process binary files differently where all comparisons of binary strings in Python 3 must be preceded by a lowercase b for it to be successful.
Here is an example:
with open(fname, "rb") as f:
for line in range(foo):
ch = f.read(1)
if ch == b' ':
do something
Notice the space ' ' has a b before it: b' '
Without this b, that comparison will always be false if that character is a space in a binary file. This can lead to a memory leak that can grow to infinite size.
Hope this helps.