Try out Embedding models and evaluate clustering
dennyabrain opened this issue · comments
Denny George commented
Try out ResNet, CLIP, ViT, VideoMAE (or something you like) and use tsne (or other approaches) to evaluate clustering visually. You can do this on a jupyter notebook and show results. Use an publicly available dataset. Evaluate if any of these models can be fine tuned
Aatman Vaidya commented
- CLiP can give us vector embeddings of an image/video
- one other dimensionality reduction method to look at could be
UMAP
, fingerprinting
Aatman Vaidya commented
Aatman Vaidya commented
- create a mixed dataset of 150-200 datasets
- Run Feluda Video Operator on a video dataset, reduce dimensions using t-SNE and do a visual plot - This will act as a baseline for us
- Embeddings models - CliP, VideoMAE
- Visual display it using t-SNE
Snehil Shah commented
Hi
Aatman Vaidya commented
@Snehil-Shah was wondering if this could be worth exploring - Video2Vec
- the approach is very old, so mostly ResNet should also perform better, but just putting it out there