RuidongZ / Deep_Matrix_Factorization_Models

Implementation of the "Deep Matrix Factorization Models for Recommender Systems"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to solve the huge matrix (>2G)

fjssharpsword opened this issue · comments

Problem:
when I conducts an experiment on Pinterest-20 dataset, the tensorflow give errors "ValueError: Cannot create a tensor proto whose content is larger than 2GB.".
Errors occur on the "self.user_item_embedding = tf.convert_to_tensor(self.dataSet.getEmbedding())".

Now, I solves it by using tf.variable.
Such as: https://blog.csdn.net/fjssharpsword/article/details/96431553

The change of code is :
def add_embedding_matrix(self): self.matrix_init = tf.placeholder(tf.float32, shape=(self.shape[0], self.shape[1])) matrix = tf.Variable(self.matrix_init) self.user_item_embedding = tf.convert_to_tensor(matrix) #self.user_item_embedding = tf.convert_to_tensor(self.dataSet.getEmbedding()) self.item_user_embedding = tf.transpose(self.user_item_embedding)

def init_sess(self): self.config = tf.ConfigProto() self.config.gpu_options.allow_growth = True self.config.allow_soft_placement = True self.sess = tf.Session(config=self.config) #self.sess.run(tf.global_variables_initializer()) self.sess.run(tf.global_variables_initializer(), feed_dict={self.matrix_init: self.dataSet.getEmbedding()})

Is it works? I don't know! Can you give me other effective solutions, thanks!

commented

你好,请问这个问题你解决了吗?

https://blog.csdn.net/fjssharpsword/article/details/96431553
This is my solution, but it runs slowly.

commented

你好,我使用你的方法处理矩阵过大的问题,然后使用movielens10M(User Num: 71567,Item Num: 65133)的数据集进行训练。结果出现下面的错误,不知道是什么原因。还请指教一下!

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[71567,65133] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu
[[node Variable/Adam_1/Assign (defined at D:/PycharmProjects/Basic-DMF-Model/main.py:139) = Assign[T=DT_FLOAT, _class=["loc:@Variable/Assign"], _grappler_relax_allocator_constraints=true, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Variable/Adam_1, Variable/Adam/Initializer/zeros)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

我也遇到了类似的问题,感觉作者根本没有考虑大评分矩阵的问题,DSSM还作了hash,这个就直接放onehot向量进去,且不说能不能跑,就这个存储量一般机器都承受不了