SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard

Home Page:https://arxiv.org/abs/2309.12871

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A little doubt about the paper

Night-Quiet opened this issue · comments

26e3cf3d01db433d58e90a5ebb402f8
From a code perspective, the paper concludes by adding up all values of the complex loss function
bd67d34543e5c8bf0fe30ebd95e4d1d
This is a normal complex division formula and transformation. The purpose of the paper is to obtain the content of the red box.
But you ultimately add up, as shown in the following figure:
0657161fd4f4f17275191040b84bf55
Is this the desired result of the paper? May I ask if you can tell me?, thank you.

commented

hi @Night-Quiet, sorry for the delayed reply.

Thank you for providing the clear formula derivation in polar coordinates. It appears to be correct.

To accumulate the angle differences, we sum them up. In polar coordinates, the result indeed $\sqrt{2}\sin(\Delta)$, where $\Delta = \theta_i - \theta_j + \frac{\pi}{4}$.
But for polar coordinates, supposed $\sin(\Delta)=x$, the desired result should be $\Delta = arcsin(x)$.
It is hard to implement this in code. Thus, in this paper, we use an approximate calculation method for the angle difference, as demonstrated in the paper, taking practical considerations into account. Following this approach, the operation sum(y_pred) serves as a pooling operation, which can be a mean or other types of pooling operation. This pooling step is necessary to compute the final loss.

The reason for computing the normalized angle difference is to create a more intuitive similarity measurement than cos. In this context, a smaller angle difference indicates greater similarity.

Thank you for your reply. I think I understand what you mean.