koaning / embetter

just a bunch of useful embeddings

Home Page:https://koaning.github.io/embetter/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Finally start work on `prodigy-embetter`

koaning opened this issue · comments

python -m prodigy textcat.emb.manual <dataset> <examples.jsonl> --labels --loader --anchors --exclusive
python -m prodigy image.clip.by_text <dataset> <examples.jsonl> --labels --loader --anchors --exclusive --remove-base64
python -m prodigy image.clip.by_image <dataset> <examples.jsonl> --labels --loader --anchors --exclusive --remove-base64

After working on the "frontpage" project, I think this is no longer the best way to go about this. Calculating the embeddings on the fly is expensive and it may be better to have a simple ANN index instead.