how to solve the huge matrix (>2G)

Question

how to solve the huge matrix (>2G)

fjssharpsword opened this issue 5 years ago · comments

Problem:
when I conducts an experiment on Pinterest-20 dataset, the tensorflow give errors "ValueError: Cannot create a tensor proto whose content is larger than 2GB.".
Errors occur on the "self.user_item_embedding = tf.convert_to_tensor(self.dataSet.getEmbedding())".

Now, I solves it by using tf.variable.
Such as: https://blog.csdn.net/fjssharpsword/article/details/96431553

The change of code is :
def add_embedding_matrix(self): self.matrix_init = tf.placeholder(tf.float32, shape=(self.shape[0], self.shape[1])) matrix = tf.Variable(self.matrix_init) self.user_item_embedding = tf.convert_to_tensor(matrix) #self.user_item_embedding = tf.convert_to_tensor(self.dataSet.getEmbedding()) self.item_user_embedding = tf.transpose(self.user_item_embedding)

def init_sess(self): self.config = tf.ConfigProto() self.config.gpu_options.allow_growth = True self.config.allow_soft_placement = True self.sess = tf.Session(config=self.config) #self.sess.run(tf.global_variables_initializer()) self.sess.run(tf.global_variables_initializer(), feed_dict={self.matrix_init: self.dataSet.getEmbedding()})

Is it works? I don't know! Can you give me other effective solutions, thanks!

XX · Answer 1 · Wed Aug 07 2019 16:40:18 GMT+0800 (China Standard Time)

你好，请问这个问题你解决了吗？

Jason.F · Answer 2 · Thu Aug 08 2019 09:59:33 GMT+0800 (China Standard Time)

https://blog.csdn.net/fjssharpsword/article/details/96431553
This is my solution, but it runs slowly.

XX · Answer 3 · Fri Aug 09 2019 14:33:32 GMT+0800 (China Standard Time)

你好，我使用你的方法处理矩阵过大的问题，然后使用movielens10M（User Num: 71567，Item Num: 65133）的数据集进行训练。结果出现下面的错误，不知道是什么原因。还请指教一下！

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[71567,65133] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu
[[node Variable/Adam_1/Assign (defined at D:/PycharmProjects/Basic-DMF-Model/main.py:139) = Assign[T=DT_FLOAT, _class=["loc:@Variable/Assign"], _grappler_relax_allocator_constraints=true, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Variable/Adam_1, Variable/Adam/Initializer/zeros)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

ArthurPang · Answer 4 · Sun Nov 24 2019 18:06:33 GMT+0800 (China Standard Time)

我也遇到了类似的问题，感觉作者根本没有考虑大评分矩阵的问题，DSSM还作了hash，这个就直接放onehot向量进去，且不说能不能跑，就这个存储量一般机器都承受不了