amartyaamp / CodeComb

Search your repo by context keywords

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Check and explore other document representations and distance metric

amartyaamp opened this issue · comments

Our current representation uses averaged Word2Vec . There can be other word/ document embeddings which may improve the search results. For eg. -

  • Fasttext (averaged)
  • MOE (averaged)
  • StarSpace (averaged)
  • Glove (averaged)
  • Universal Sentence encoder

Alongwith these, we need to look at different distance metrics that can be used

  • Manhattan
  • WMD-relax (pure WMD is too slow)

We need to set a threshold by comparing each or most of them with the current approach.
Accuracy, latency and speed of training is the main concern here.