Check and explore other document representations and distance metric

Question

Check and explore other document representations and distance metric

amartyaamp opened this issue 5 years ago · comments

Our current representation uses averaged Word2Vec . There can be other word/ document embeddings which may improve the search results. For eg. -

Fasttext (averaged)
MOE (averaged)
StarSpace (averaged)
Glove (averaged)
Universal Sentence encoder

Alongwith these, we need to look at different distance metrics that can be used

Manhattan
WMD-relax (pure WMD is too slow)

We need to set a threshold by comparing each or most of them with the current approach.
Accuracy, latency and speed of training is the main concern here.