BGU-CS-VIL / DeepDPM

"DeepDPM: Deep Clustering With An Unknown Number of Clusters" [Ronen, Finder, and Freifeld, CVPR 2022]

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

working with custom input data

zeydabadi opened this issue · comments

Hi,
Suppose that the input data consists of 10000 images, and each image is of size MxN.
For end-to-end feature extraction and clustering scenario, what should be the size of train.pt tensor?

Also, how can we provide labels for custom data? let's say the possible labels are "positive", "negative", and "unknown".

Thanks

Hi, if you are using the feature extraction end-to-end format, train.pt should be of size 10000XM*N (e.g., if M = 10, N = 20, 10000X200).
You don't need to provide labels, this is an unsupervised method :)
Simply supply train_data.pt and test_data.pt

Thanks for your response. I asked about labels so I could evaluate the clustering outcomes. How would you evaluate your end-to-end feature extraction and clustering algorithm if you don't have labels?

Hey, you are able to look at our paper for a more detailed answer but in short: our algorithm is intended for cases where labels are not available (i.e., an unsupervised task). There are common unsupervised metrics used in literature such as the silhouette score, and many more.
If you have labels you can use them for evaluation (as we compared ourselves in our paper on supervised datasets) using the --use_labels_for_eval flag