How to get dynamic word vectors

Question

How to get dynamic word vectors

MarrytheToilet opened this issue a year ago · comments

Hi,I want to be able to input a sentence and output a word vector for each word, like BERT as service.
bert-serving-start -pooling_strategy NONE -model_dir /tmp/english_L-12_H-768_A-12/
bc = BertClient()
vec = bc.encode(['hey you', 'whats up?'])

vec # [2, 25, 768]
vec[0] # [1, 25, 768], sentence embeddings for hey you
vec[0][0] # [1, 1, 768], word embedding for [CLS]
vec[0][1] # [1, 1, 768], word embedding for hey
vec[0][2] # [1, 1, 768], word embedding for you
vec[0][3] # [1, 1, 768], word embedding for [SEP]
vec[0][4] # [1, 1, 768], word embedding for padding symbol
vec[0][25] # error, out of index!

felix-wang · Answer 1 · Fri Apr 07 2023 11:56:32 GMT+0800 (China Standard Time)

@MarrytheToilet In the current implementation of CLIP model, the word-level embeddings are not returned. To support your case, we need to refactor encode_text(...) API to return full sequence of embeddings, rather than the eos_token embedding only. May I know what's the downstream tasks you are working on that needs word-level embeddings?

Hanyu Wang · Answer 2 · Fri Apr 07 2023 12:10:33 GMT+0800 (China Standard Time)

Thank you for your reply. I am just an undergraduate completing my graduation project. I hope to find metaphors in text by obtaining dynamic word vectors.
I have now completed my requirements through bert-as-service, thank you very much!

felix-wang · Answer 3 · Fri Apr 07 2023 13:37:08 GMT+0800 (China Standard Time)

Nice, I will close this issue now. And please feel free to open a new issue when you have new questions.