Check and explore other document representations and distance metric
amartyaamp opened this issue · comments
Amartya Chaudhuri commented
Our current representation uses averaged Word2Vec . There can be other word/ document embeddings which may improve the search results. For eg. -
- Fasttext (averaged)
- MOE (averaged)
- StarSpace (averaged)
- Glove (averaged)
- Universal Sentence encoder
Alongwith these, we need to look at different distance metrics that can be used
- Manhattan
- WMD-relax (pure WMD is too slow)
We need to set a threshold by comparing each or most of them with the current approach.
Accuracy, latency and speed of training is the main concern here.