openai / weak-to-strong

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exploring Weak to Strong Generalization from a pre-training standpoint

rokosbasilisk opened this issue · comments

In the paper, a "stronger" model is defined as a model with the same architecture but a greater number of parameters. I am curious if any research has been conducted regarding weak to strong generalization, where the weak-supervisor model is less pretrained, and the stronger-student is more pretrained.

I am currently exploring the use of Pythia-models checkpoints to assess performance on BoolQ (https://github.com/rokosbasilisk/weak-to-strong where weaker student model is a checkpoint of the model which is few steps before the stronger student model).

Has any prior work been undertaken in this direction? If not, could you provide insights into why this area remains unexplored?

i am not that familiar with the literature but there's which uses training time for strength https://aclanthology.org/2023.acl-long.796/. overall seems like a reasonable direction and I suspect there are many under-explored things in this space!

Screenshot from 2023-12-20 14-20-08
i am running the train_weak_to_strong over a range of parameter sizes at different checkpoint steps,
surprisingly when the weak model and the strong model are exactly the same (in terms of both params and checkpoint steps) there is a gain in the accuracy for the stronger model in most cases as seen in "acc_diff" column. i am currently trying to check this holds true for much larger (till ~12B params over multiple checkpoint steps).

Any idea why this might happen?

i would guess it's just randomness, could be that the second training split is better for idiosyncratic reasons

Created a dataset of weak,strong and transfer accuracies for pythia 1b,1,4B,2.8B models at 5 different stages of their pretraining https://github.com/rokosbasilisk/weak-to-strong/blob/EDA/eda/results_df.csv.
Currently doing some EDA to check effect Pretraining vs Parameters on w2s generalization. any suggessions are welcome