Feature: `-- plan` flag

Question

Feature: `-- plan` flag

swyxio opened this issue a year ago · comments

swyx.io commented a year ago

this is an easy one

https://news.ycombinator.com/item?id=35970653
Self-planning Code Generation with Large Language Model
https://arxiv.org/pdf/2303.06689.pdf idea
https://news.ycombinator.com/item?id=35735375

swyx.io · Answer 1 · Thu May 18 2023 14:26:05 GMT+0800 (China Standard Time)

swyx.io commented a year ago

swyx.io · Answer 2 · Mon Jun 19 2023 13:36:08 GMT+0800 (China Standard Time)

The discovery that only GPT-4 can self-improve, while weaker models cannot, is very intriguing, indicating a new type of emergent ability (i.e. to improve upon natural language feedback) may only exist when the model is "mature" (large and well-aligned) enough
https://twitter.com/Francis_YAO_/status/1670618013089820674

Large Language Models (LLMs) have shown remarkable aptitude in code generation but still struggle on challenging programming tasks. Self-repair -- in which the model debugs and fixes mistakes in its own code -- has recently become a popular way to boost performance in these settings. However, only very limited studies on how and when self-repair works effectively exist in the literature, and one might wonder to what extent a model is really capable of providing accurate feedback on why the code is wrong when that code was generated by the same model. In this paper, we analyze GPT-3.5 and GPT-4's ability to perform self-repair on APPS, a challenging dataset consisting of diverse coding challenges.