jxnl / n-levels-of-rag

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some feedback

eugeneyan opened this issue · comments

Overall

  • No lexical search? 😔

Level 2

  • The suggestions on search would have so much more weight with some basic evals on how reranking / query rewriting helped
  • Would love to see an example or two of how reranking "saves" a bad result
  • Would love to see an example or two of query rewriting
  • I wonder how the introduction of citations leads to the need to now have to ensure high recall for citation—opening a can of worms? Maybe add your take on whether it's worth it, from the customer and product standpoint?

Level 3

  • Love how the logs help with identifying poor queries and results via mean cosine score. Could you add more on how teams should identify outliers in their logs for periodic triage? (👍/👎 feedback seems like an easy win if the UX can afford it)
  • (Hindsight: Oh you discussed this in Level 5)

Level 4

  • I think we can get very far with annotating a few hundred samples. Here's a simple guide on how to write good annotation guidelines, with a heavy focus on search (disclaimer: I wrote it)
  • Do you want to go into the details of evals for answers, such as reference-based or reference-free metrics? AFAIK, it's still not a clearly defined space and there's no clear best practice. Human feedback is noisy too.

Level 5

  • Same comments as in level 3—would love more details on how to identify those outlier bad results programmatically.

using lancedb we default to both.
https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/#hybrid-search-in-lancedb

but i agreee evaluating can

Do you want to go into the details of evals for answers, such as reference-based or reference-free metrics? AFAIK, it's still not a clearly defined space and there's no clear best practice. Human feedback is noisy too.

I would love this. It's just not something I'm familiar with. Maybe something we can talk about this week.