The diverse reasoning doesn't has the function as you said,and you provide data that B_text-davinci-002__C_zs_cot_t70 is incomplete

Question

The diverse reasoning doesn't has the function as you said,and you provide data that B_text-davinci-002__C_zs_cot_t70 is incomplete

purongcheng opened this issue 7 months ago · comments

Why is the data that B_text-davinci-002__C_zs_cot_t70 you provide incomplete? Every data set will be missing some samples，like D_addsub.json missing samples with [147, 277, 285, 290, 163, 291, 38, 39, 41, 42, 43, 169, 172, 298, 47, 48, 177, 178, 305, 180, 312, 57, 185, 314, 62, 192, 193, 195, 323, 69, 70, 326, 72, 202, 203, 333, 334, 335, 209, 82, 211, 84, 337, 342, 88, 91, 348, 350, 352, 99, 227, 105, 361, 363, 377, 365, 251, 242, 115, 244, 117, 379, 121, 123, 381, 382] as index. And the method that diverse reasoning I used didn't produce the results you said,I don't think the diverse reasoning works as well as you say.

Namgyu Ho · Answer 1 · Wed Jan 24 2024 14:45:38 GMT+0800 (China Standard Time)

I think this is because of our random train-test splits. The missing samples are likely those that were assigned to the test dataset, thus we do not use teacher-generated rationales for those. Please check the paper on data splits.

There are many possible reasons for differing results. I'd be glad to discuss, if you provide some details.

purongcheng · Answer 2 · Thu Jan 25 2024 23:56:48 GMT+0800 (China Standard Time)

Yes,many reasons will influence the result,when use different temperature Settings,the generated rationales will be different.So,set temperature ==0.7is the best choice?
I think the generated rationales are too similar,the training effect of student model is affected,how to set temperature can make generated rationales are more different?

Namgyu Ho · Answer 3 · Fri Jan 26 2024 14:36:11 GMT+0800 (China Standard Time)

There will be a tradeoff between diversity vs quality of samples. We selected 0.7 based on a previous paper (self-consistency) but this may not be optimal for distillation, and the optimal value will likely vary based on the model and domain.