SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard

Home Page:https://arxiv.org/abs/2309.12871

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[QUESTION] How to use prompt C when using through HuggingFace embeddings loader

kairoswealth opened this issue · comments

I am using Llamaindex to index documents into chromadb and for that I use the HuggingFaceEmbedding abstraction like that:

embed_model = HuggingFaceEmbedding(model_name="WhereIsAI/UAE-Large-V1")

However I read that one need to specify prompt C in order to optimize the embedding for retrieval.

  1. is the prompt only used during retrieval? ie for the question embedding? or also for documents indexing?
  2. any idea if that setting is supported through HuggingFace//Llamaindex abstractions, and how?
  3. in the event that prompt C arg is not supported, would the resulting vector be significantly performing less in retrieval use cases?
commented

For question:

  1. yes, just use it for the query texts, do not use it for document indexing.

2&3. Sorry, I haven't used Llamaindex. Maybe you can manually apply the prompt to the query text as follows:

from angle_emb import Prompts

query_text = 'this is a query'
query_text = Prompts.C.format(text=query_text)

embeddings = embed_model.get_text_embedding(query_text)
...

Awesome, that is very clear now. I'll apply the prompt manually on retrieval. Thanks a lot!