Word Embedding tools
CarloLepelaars opened this issue · comments
I think it would be a nice addition to add an embedder that can easily vectorize text through SpaCy. I already have an implementation class for this and would be happy to contribute it here.
SpaCy Docs on vector:
https://spacy.io/api/doc#vector
Example code for single string:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This here text")
doc.vector
Have you seen the whatlies library?
I wrote that while at Rasa and it supports word embeddings. This project tries to be a bit more minimal. Partially because word embeddings loose their information in longer sentences due to pooling.
Ok, no problem! Will check out whatlies.
There may yet be reasons to add these embeddings. Not 100% sure, but enough doubt to re-open this issue. In particular I might want to think about adding support for spacy and bytepair embeddings. Will be doing entity stuff soon.
I'm pretty sure that I don't want to support TFHub or Gensim tho.