IndicoDataSolutions / finetune

Scikit-learn style model finetuning for NLP

Home Page:https://finetune.indico.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

getting much lower accuracy with new release of finetune library

rnyak opened this issue · comments

commented

Describe the bug
I updated my finetune library to the latest version two days ago. For sanity check, I loaded my fine-tuned and saved models from previous model. I get totally different training and test accuracies. In the previous version, my train and test accuracy was 90% and 82%, now with this new release, with the same fine-tuned model, and same datasets, but I am getting 34% for training set, and 16% for test set. This is a huge difference. I assume there is a bug, or something else going on?

My code lines for fine tuning:

import time
start = time.time()
model = Classifier(n_epochs=2 , base_model=GPT2Model, tensorboard_folder ='/workspace/checkpoints', max_length= 1024, val_size = 1000, chunk_long_sequences=False, keep_best_model= True)
model.fit(trainX, trainY)
print("total training time:", time.time() - start)

for testing:

#Load the saved model
model= Classifier.load('./checkpoints/2epochs_GPT2')
#test accuracy for the test set 
pred_test = model.predict(testX)
accuracy = np.mean(pred_test == testY)
print('Test Accuracy: {:0.3f}'.format(accuracy))

Hi @rnyak,

Huge thanks for the bug report, that sounds problematic. Did downgrading to 0.8.4 resolve your issue? I'm assuming the model was trained on 0.8.4, you upgraded to 0.8.5, and then observed the regression?

commented

How can I downgrade to 0.84? I upgraded to 0.85 because I wanted to use eval_acc param which does not work on 0.83.

Did you install via pip? if so pip install finetune==0.8.4 should do it. Did you originally train on 0.8.3? If you re-train on 0.8.5 what do you get?

commented

@benleetownsend I do not do pip install. I build the finetune docker image from the docker file. When I retrain the same model with same dataset on 0.8.5 I get similar training and test accuracy. The problem happens when I load the trained model from the previous release.

Do you happen to know the two relevant commit hashes so I can try to reproduce?

I tried training on 08e0f31
and loading and inferring on 56938f7

These are the release commits for version 0.8.3 and 0.8.5 respectively and could not reproduce your regression.

commented

@benleetownsend I have another question regarding "eval_acc" which is important to me. That's the reason why I upgraded to 0.85 to be able to use eval_acc.

I set eval_acc = True, and trained for 1 epoch to test. I checked the accuracy plot on TensorBoard. What I get is the following. The val accuracy is 0.859.

Wall time | Step | Value
--          | -- | --
1576178014 | 0 | 0.138
1576179261 | 1687 | 0.85900003

Then I wanted to see if I am getting the same val accuracy simply with:

with model.cached_predict():
     pred_test = model.predict(valX)
    accuracy = np.mean(pred_test ==valY)

This gave me 0.888 acc for the same set. Of course the model is the same trained model.

What would be the reason of this difference? Is this happens because TensorBoard shows the val_acc result based on the model acc on certain steps, but model.predict gives the results based on the best model params, because I set the keep_best_model= True ?

or there is something else?

Yes, this is correct. So we always run the internal validation on the latest model and not the best model as this is necessary to track improvement.

commented

@benleetownsend Thanks. I see. So when we save the model with model.save(), it saves only the best model? Is that correct?

That is correct.