Runningtime Warning appears when using own dataset to build graph

Question

Runningtime Warning appears when using own dataset to build graph

ZihanChen1995 opened this issue 4 years ago · comments

您好，我在尝试使用自己的dataset进行第二步: build_graph.py时，报了两个warning:

RuntimeWarning: invalid value encountered in double_scalars

build_graph.py:255: RuntimeWarning: invalid value encountered in double_scalars
  data_x.append(doc_vec[j] / doc_len)  # doc_vec[j]/ doc_len
[[1 0]
 [1 0]
 [0 1]
 ...
 [1 0]
 [1 0]
 [0 1]]
build_graph.py:293: RuntimeWarning: invalid value encountered in double_scalars
  data_tx.append(doc_vec[j] / doc_len)  # doc_vec[j] / doc_len
[[0 1]
 [1 0]
 [1 0]
 ...

tcmalloc: large alloc

tcmalloc: large alloc 1342185472 bytes == 0x10d97c000 @  0x7f8c290341e7 0x5ab685 0x569c94 0x56b303 0x50ca54 0x507d64 0x50ae13 0x634c82 0x634d37 0x6384ef 0x639091 0x4b0d00 0x7f8c28c31b97 0x5b250a

请问一下引起这个问题的原因是我的数据集过大吗？他会对模型训练的效果产生影响吗？

另外能否请教下，对于unblanced data的分类问题，对于提升text_gcn的表现您有什么建议吗？

谢谢！！

Dr. Liang Yao (姚亮) · Answer 1 · Sat Jul 18 2020 01:53:00 GMT+0800 (China Standard Time)

@ZihanChen1995

您好，第一个应该不是数据集过大，第二个估计是。

我试过降采样，发现提升不大。可能增加小类别文档与单词间的权重可以提升。

Zihan_Chen · Answer 2 · Sat Jul 18 2020 03:03:31 GMT+0800 (China Standard Time)

谢谢您的回复，我尝试一下您说的提升权重的方法。谢谢！