BGU-CS-VIL / DeepDPM

"DeepDPM: Deep Clustering With An Unknown Number of Clusters" [Ronen, Finder, and Freifeld, CVPR 2022]

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training stuck at one certain epoch

derkbreeze opened this issue · comments

Hi Meitar,

So I was training on the MNIST dataset using pretrained features, e.g.

python DeepDPM.py --dataset MNIST --dir './pretrained_embeddings/umap_embedded_datasets/MNIST' --gpus 0

but every time training stucks at epoch 44 and will not continue, log:

Epoch 0: 100%|███████████| 547/547 [00:00<00:00, 661.71it/s, loss=nan, v_num=]Initializing clusters params using Kmeans...
Epoch 44: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 547/547 [00:19<00:00, 27.60it/s, loss=0, v_num=]

Also, why the loss becomes nan in the first epoch? Appreciate if you can suggest!