Why is the validation and testing of anli dataset with rational have more than 1000 examples?
BalloutAI opened this issue · comments
I am looking at the files inside llm folder of anli1, the val_cot_0 has more than 1400 samples (I looked the number of "so the answer is" in the file) while the validation without rational has 1000 samples? Why is there difference?