jina-ai / clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Home Page:https://clip-as-service.jina.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to get dynamic word vectors

MarrytheToilet opened this issue · comments

Hi,I want to be able to input a sentence and output a word vector for each word, like BERT as service.
bert-serving-start -pooling_strategy NONE -model_dir /tmp/english_L-12_H-768_A-12/
bc = BertClient()
vec = bc.encode(['hey you', 'whats up?'])

vec # [2, 25, 768]
vec[0] # [1, 25, 768], sentence embeddings for hey you
vec[0][0] # [1, 1, 768], word embedding for [CLS]
vec[0][1] # [1, 1, 768], word embedding for hey
vec[0][2] # [1, 1, 768], word embedding for you
vec[0][3] # [1, 1, 768], word embedding for [SEP]
vec[0][4] # [1, 1, 768], word embedding for padding symbol
vec[0][25] # error, out of index!

@MarrytheToilet In the current implementation of CLIP model, the word-level embeddings are not returned. To support your case, we need to refactor encode_text(...) API to return full sequence of embeddings, rather than the eos_token embedding only. May I know what's the downstream tasks you are working on that needs word-level embeddings?

Thank you for your reply. I am just an undergraduate completing my graduation project. I hope to find metaphors in text by obtaining dynamic word vectors.
I have now completed my requirements through bert-as-service, thank you very much!

Nice, I will close this issue now. And please feel free to open a new issue when you have new questions.