tattle-made / feluda

A configurable engine for analysing multi-lingual and multi-modal content.

Home Page:https://tattle.co.in/products/feluda/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Try out Embedding models and evaluate clustering

dennyabrain opened this issue · comments

Try out ResNet, CLIP, ViT, VideoMAE (or something you like) and use tsne (or other approaches) to evaluate clustering visually. You can do this on a jupyter notebook and show results. Use an publicly available dataset. Evaluate if any of these models can be fine tuned

  • CLiP can give us vector embeddings of an image/video
  • one other dimensionality reduction method to look at could be UMAP, fingerprinting
  • create a mixed dataset of 150-200 datasets
  • Run Feluda Video Operator on a video dataset, reduce dimensions using t-SNE and do a visual plot - This will act as a baseline for us
  • Embeddings models - CliP, VideoMAE
  • Visual display it using t-SNE

@Snehil-Shah was wondering if this could be worth exploring - Video2Vec - the approach is very old, so mostly ResNet should also perform better, but just putting it out there