Some feedback

Question

Some feedback

eugeneyan opened this issue 4 months ago · comments

Eugene Yan commented 4 months ago

Overall

No lexical search? 😔

Level 2

The suggestions on search would have so much more weight with some basic evals on how reranking / query rewriting helped
Would love to see an example or two of how reranking "saves" a bad result
Would love to see an example or two of query rewriting
I wonder how the introduction of citations leads to the need to now have to ensure high recall for citation—opening a can of worms? Maybe add your take on whether it's worth it, from the customer and product standpoint?

Level 3

Love how the logs help with identifying poor queries and results via mean cosine score. Could you add more on how teams should identify outliers in their logs for periodic triage? (👍/👎 feedback seems like an easy win if the UX can afford it)
(Hindsight: Oh you discussed this in Level 5)

Level 4

I think we can get very far with annotating a few hundred samples. Here's a simple guide on how to write good annotation guidelines, with a heavy focus on search (disclaimer: I wrote it)
Do you want to go into the details of evals for answers, such as reference-based or reference-free metrics? AFAIK, it's still not a clearly defined space and there's no clear best practice. Human feedback is noisy too.

Level 5

Same comments as in level 3—would love more details on how to identify those outlier bad results programmatically.

Jason Liu · Answer 1 · Wed Feb 28 2024 10:55:57 GMT+0800 (China Standard Time)

using lancedb we default to both.
https://lancedb.github.io/lancedb/hybrid_search/hybrid_search/#hybrid-search-in-lancedb

but i agreee evaluating can

Jason Liu · Answer 2 · Wed Feb 28 2024 11:06:01 GMT+0800 (China Standard Time)

Do you want to go into the details of evals for answers, such as reference-based or reference-free metrics? AFAIK, it's still not a clearly defined space and there's no clear best practice. Human feedback is noisy too.

I would love this. It's just not something I'm familiar with. Maybe something we can talk about this week.