Cannot use annotations/citations as CONTEXT

Question

Cannot use annotations/citations as CONTEXT

adorosario opened this issue 4 months ago · comments

https://github.com/milvus-io/bootcamp/blame/b794151ccb64ae46419a21cf897282cd8818fd6e/evaluation/evaluate_fiqa_openai.ipynb#L198

Great job with this benchmarking. Nicely done.

One problem though: You cannot use the annotations/citations from a RAG SaaS agent like OpenAI Assistants or CustomGPT.ai as CONTEXT for the sake of benchmarking. That is wrong. RAG SaaS agents don't expose their contexts, so you cannot "guess" the context and use annotations as a proxy -- and so any of the metrics in your analysis that refer to CONTEXT do not hold.

Besides that, the answer_similarity and answer_correctness (which ragas has for End-to-End RAG evaluation) -- those two are fine.

For example: see similar comparison for black box RAG agents from Tonic, see this (only answer similarity is available)

Shahul ES · Answer 1 · Fri Mar 08 2024 06:30:16 GMT+0800 (China Standard Time)

Hi @adorosario, Shahul from Ragas here. Not sure If I understand your concern currently. But the context in ragas refers to retrieved context. Ragas metrics (that require context like faithfulness) can only be used with this information being available.

ChengZi · Answer 2 · Fri Mar 08 2024 10:16:04 GMT+0800 (China Standard Time)

@adorosario thanks for your advice. As far as I know, openai provides context: https://platform.openai.com/docs/assistants/how-it-works/message-annotations
so it seem not a black box RAG agent.

Alden DoRosario · Answer 3 · Fri Mar 22 2024 09:24:03 GMT+0800 (China Standard Time)

@zc277584121

@adorosario thanks for your advice. As far as I know, openai provides context: https://platform.openai.com/docs/assistants/how-it-works/message-annotations so it seem not a black box RAG agent.

Sorry - I think you might be mis-understanding the difference between "annotations" and "context".

Context is the (possibly) thousands of tokens/words from the knowledge base that is used to create the response. For example, in our RAG platform, we could technically use tens of thousands of words of context (think of it as 20-30 pages) to create the response. This CONTEXT is never shown to the client in a black-box RAG service like OpenAI assistants or CustomGPT.ai

Annotations are tiny snippets of text from the knowledge base that were used to construct the answer .. they are typically about 10% (if at all) of the context that was used.

Using them interchangeably in a black-box RAG would be wrong and skew your benchmarks.

@shahules786

Shahul from Ragas here. Not sure If I understand your concern currently. But the context in ragas refers to retrieved context. Ragas metrics (that require context like faithfulness) can only be used with this information being available.

Shahul -- yes - in a black box RAG service like OpenAI assistants or CustomGPT.ai, the retrieved CONTEXT is never available to the client -- so any benchmark metric that involves retrieved CONTEXT cannot be computed (which ragas does correctly). Using annotations and context interchangeably to compute any metric like faithfulness would be wrong - that is the reason for raising this issue.