getting much lower accuracy with new release of finetune library

Question

getting much lower accuracy with new release of finetune library

rnyak opened this issue 5 years ago · comments

Describe the bug
I updated my finetune library to the latest version two days ago. For sanity check, I loaded my fine-tuned and saved models from previous model. I get totally different training and test accuracies. In the previous version, my train and test accuracy was 90% and 82%, now with this new release, with the same fine-tuned model, and same datasets, but I am getting 34% for training set, and 16% for test set. This is a huge difference. I assume there is a bug, or something else going on?

My code lines for fine tuning:

import time
start = time.time()
model = Classifier(n_epochs=2 , base_model=GPT2Model, tensorboard_folder ='/workspace/checkpoints', max_length= 1024, val_size = 1000, chunk_long_sequences=False, keep_best_model= True)
model.fit(trainX, trainY)
print("total training time:", time.time() - start)

for testing:

#Load the saved model
model= Classifier.load('./checkpoints/2epochs_GPT2')
#test accuracy for the test set 
pred_test = model.predict(testX)
accuracy = np.mean(pred_test == testY)
print('Test Accuracy: {:0.3f}'.format(accuracy))

Madison May · Answer 1 · Mon Dec 09 2019 21:11:31 GMT+0800 (China Standard Time)

Hi @rnyak,

Huge thanks for the bug report, that sounds problematic. Did downgrading to 0.8.4 resolve your issue? I'm assuming the model was trained on 0.8.4, you upgraded to 0.8.5, and then observed the regression?

rnyak · Answer 2 · Wed Dec 11 2019 08:06:51 GMT+0800 (China Standard Time)

How can I downgrade to 0.84? I upgraded to 0.85 because I wanted to use eval_acc param which does not work on 0.83.

benleetownsend · Answer 3 · Thu Dec 12 2019 00:22:50 GMT+0800 (China Standard Time)

Did you install via pip? if so pip install finetune==0.8.4 should do it. Did you originally train on 0.8.3? If you re-train on 0.8.5 what do you get?

rnyak · Answer 4 · Fri Dec 13 2019 08:06:32 GMT+0800 (China Standard Time)

@benleetownsend I do not do pip install. I build the finetune docker image from the docker file. When I retrain the same model with same dataset on 0.8.5 I get similar training and test accuracy. The problem happens when I load the trained model from the previous release.

benleetownsend · Answer 5 · Fri Dec 13 2019 21:17:11 GMT+0800 (China Standard Time)

Do you happen to know the two relevant commit hashes so I can try to reproduce?

I tried training on 08e0f31
and loading and inferring on 56938f7

These are the release commits for version 0.8.3 and 0.8.5 respectively and could not reproduce your regression.

rnyak · Answer 6 · Fri Dec 13 2019 21:53:53 GMT+0800 (China Standard Time)

@benleetownsend I have another question regarding "eval_acc" which is important to me. That's the reason why I upgraded to 0.85 to be able to use eval_acc.

I set eval_acc = True, and trained for 1 epoch to test. I checked the accuracy plot on TensorBoard. What I get is the following. The val accuracy is 0.859.

Wall time | Step | Value
--          | -- | --
1576178014 | 0 | 0.138
1576179261 | 1687 | 0.85900003

Then I wanted to see if I am getting the same val accuracy simply with:

with model.cached_predict():
     pred_test = model.predict(valX)
    accuracy = np.mean(pred_test ==valY)

This gave me 0.888 acc for the same set. Of course the model is the same trained model.

What would be the reason of this difference? Is this happens because TensorBoard shows the val_acc result based on the model acc on certain steps, but model.predict gives the results based on the best model params, because I set the keep_best_model= True ?

or there is something else?

benleetownsend · Answer 7 · Fri Dec 13 2019 22:28:28 GMT+0800 (China Standard Time)

Yes, this is correct. So we always run the internal validation on the latest model and not the best model as this is necessary to track improvement.

rnyak · Answer 8 · Fri Dec 13 2019 22:36:28 GMT+0800 (China Standard Time)

@benleetownsend Thanks. I see. So when we save the model with model.save(), it saves only the best model? Is that correct?

benleetownsend · Answer 9 · Mon Dec 16 2019 19:11:46 GMT+0800 (China Standard Time)

That is correct.