SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard

Home Page:https://arxiv.org/abs/2309.12871

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training data for UAE-Large-V1

memray opened this issue · comments

Hi,

Awesome work! Can you share the details about what data was used for adapting WhereIsAI/UAE-Large-V1 from BGE-large? Can you share the data as well?

Thanks!

commented

Hi @memray, many thanks for following our work!

We're sorry for any inconvenience caused by the fact that we did not publish our training details yet.
Below is the training data that was used for fine-tuning UAE.

image

  • high_q_sts: it is a high-quality and challenging sts dataset, collected by human annotating
  • retrieval: we transformed multiple QA datasets for retrieval tasks. Plus, we collected some actual retrieval data (positive samples) from search engines and used some techniques to generate negative samples.

We are now working on Next Generation sentence embeddings. After we release our new sentence embedding model, we will open-source our training details for UAE.