TaoRuijie / Loss-Gated-Learning

ICASSP 2022: 'Self-supervised Speaker Recognition with Loss-gated Learning'

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CUDA out of memory

peggyxpxu opened this issue · comments

When I train Stage2, GPU is 11G, batchsize is 1. But it keeps showing 'CUDA out of memory'

Reduce here instead of bs:

https://github.com/TaoRuijie/Loss-Gated-Learning/blob/main/Stage2/dataLoader.py#L119

tks, but I find another error when training Stage2:
Traceback (most recent call last):
File "main_train.py", line 59, in
dic_label, NMI = Trainer.cluster_network(loader = clusterLoader, n_cluster = args.n_cluster) # Do clustering
File "/mnt/data3/jhuser2/code/Loss-Gated-Learning-main/Stage2/model.py", line 60, in cluster_network
clus.train(out_all, index) # Clustering
File "/mnt/data1/xxp/miniconda3/envs/Loss-Gated-Learning/lib/python3.8/site-packages/faiss/class_wrappers.py", line 85, in replacement_train
self.train_c(n, swig_ptr(x), index)
File "/mnt/data1/xxp/miniconda3/envs/Loss-Gated-Learning/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 2560, in train
return _swigfaiss_avx2.Clustering_train(self, n, x, index, x_weights)
RuntimeError: Error in void faiss::Clustering::train_encoded(faiss::Clustering::idx_t, const uint8_t*, const faiss::Index*, faiss::Index&, const float*) at /root/miniconda3/conda-bld/faiss-pkg_1669821803039/work/faiss/Clustering.cpp:277: Error: 'nx >= k' failed: Number of training points (5130) should be at least as large as number of clusters (6000)

Sorry I am not sure why you have the number '5130' here...also sorry for missing your information