How random is the division of images?
chrisgoringe opened this issue · comments
Making use of your excellent collection of scored images, I noticed that almost any model I trained on the train_set images performed better on the test_set than it performed during training.
Using excel on the csv files, I found this:
train_set.csv - stdev of scores = 1.646
eval_set.csv - stdev of scores = 1.095
test_set.csv - stdev of scores = 0.6074
combined - stdev of all scores = 1.548
It appears that some hidden factor has biased the division of the images into train, eval, and test sets?