koaning / embetter

just a bunch of useful embeddings

Home Page:https://koaning.github.io/embetter/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Word Embedding tools

CarloLepelaars opened this issue · comments

I think it would be a nice addition to add an embedder that can easily vectorize text through SpaCy. I already have an implementation class for this and would be happy to contribute it here.

SpaCy Docs on vector:
https://spacy.io/api/doc#vector

Example code for single string:

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This here text")
doc.vector

Have you seen the whatlies library?

I wrote that while at Rasa and it supports word embeddings. This project tries to be a bit more minimal. Partially because word embeddings loose their information in longer sentences due to pooling.

Ok, no problem! Will check out whatlies.

There may yet be reasons to add these embeddings. Not 100% sure, but enough doubt to re-open this issue. In particular I might want to think about adding support for spacy and bytepair embeddings. Will be doing entity stuff soon.

I'm pretty sure that I don't want to support TFHub or Gensim tho.