Is this work in progress?

Question

Is this work in progress?

jbdatascience opened this issue 6 months ago · comments

Very nice initiative to do a code implementation of "Self-Rewarding LLMs" (https://arxiv.org/pdf/2401.10020.pdf !

Is this work in progress? I would very much like to do some experiments with this code, if it is ready of course!
I could not find an official implementation yet ...

Phil Wang · Answer 1 · Sat Jan 20 2024 23:48:32 GMT+0800 (China Standard Time)

hey yes it is. for all my repositories, if you see wip, it is not complete

Jan Bours · Answer 2 · Sun Jan 21 2024 00:10:32 GMT+0800 (China Standard Time)

hey yes it is. for all my repositories, if you see wip, it is not complete

OK, I can imagine this is not easy to code! Can you explain how far you are with this and what are the obstacles ? I am very curious!

Phil Wang · Answer 3 · Sun Jan 21 2024 00:11:52 GMT+0800 (China Standard Time)

@jbdatascience

the paper is 2 days old. give me until end of month

Do you also think this is a very important milestone for open source LLMs?

unknown, but due to the simplicity, worth exploring

Jan Bours · Answer 4 · Sun Jan 21 2024 00:14:05 GMT+0800 (China Standard Time)

@jbdatascience

the paper is 2 days old. give me until end of month

Do you also think this is a very important milestone for open source LLMs?

unknown, but due to the simplicity, worth exploring

Do you also think this is a very important milestone for open source LLMs?

I also asked a question to one of the authors of the paper:

question about the very intriguing findings in “Self-Rewarding LLMs”
13:46 (3 uur geleden)
aan jaseweston

Good afternoon,

I have a question about the very intriguing findings in “Self-Rewarding LLMs”. Of course the experimental findings support the hypothesis that LLMs can indeed self-improve, that is undeniable!

But I can not wrap my head around how that is even possible from a abstract point of view. A LLM contains a lot of knowledge / information. Are the findings just a another way of saying that you have to employ particular kinds of techniques to elicit responses from a LLM that can reach that knowledge?

My question is:
What is the theoretical basis for this bootstrapping process. Where is the information to improve coming from? Why should we expect it to improve in the direction we want?

I hope you can shed some light 💡 on these questions!

Also I would like to ask if there is some open source code available so that I can experiment with it?

Best wishes,

Jan Bours
Data Scientist / Certified Data Science Professional (CDSP)