Is this work in progress?
jbdatascience opened this issue · comments
Very nice initiative to do a code implementation of "Self-Rewarding LLMs" (https://arxiv.org/pdf/2401.10020.pdf !
Is this work in progress? I would very much like to do some experiments with this code, if it is ready of course!
I could not find an official implementation yet ...
hey yes it is. for all my repositories, if you see wip
, it is not complete
hey yes it is. for all my repositories, if you see
wip
, it is not complete
OK, I can imagine this is not easy to code! Can you explain how far you are with this and what are the obstacles ? I am very curious!
the paper is 2 days old. give me until end of month
Do you also think this is a very important milestone for open source LLMs?
unknown, but due to the simplicity, worth exploring
the paper is 2 days old. give me until end of month
Do you also think this is a very important milestone for open source LLMs?
unknown, but due to the simplicity, worth exploring
Do you also think this is a very important milestone for open source LLMs?
I also asked a question to one of the authors of the paper:
question about the very intriguing findings in “Self-Rewarding LLMs”
13:46 (3 uur geleden)
aan jaseweston
Good afternoon,
I have a question about the very intriguing findings in “Self-Rewarding LLMs”. Of course the experimental findings support the hypothesis that LLMs can indeed self-improve, that is undeniable!
But I can not wrap my head around how that is even possible from a abstract point of view. A LLM contains a lot of knowledge / information. Are the findings just a another way of saying that you have to employ particular kinds of techniques to elicit responses from a LLM that can reach that knowledge?
My question is:
What is the theoretical basis for this bootstrapping process. Where is the information to improve coming from? Why should we expect it to improve in the direction we want?
I hope you can shed some light 💡 on these questions!
Also I would like to ask if there is some open source code available so that I can experiment with it?
Best wishes,
Jan Bours
Data Scientist / Certified Data Science Professional (CDSP)