jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Finetuning Issue with Example Data

mosala777 opened this issue · comments

Hello,

I tried finetuning the 6-mer model with the provided example data. Performance seems to be good at first, but then it suddenly drops significantly raising a warning. First few evaluations start normally and improve like this:

05/20/2022 19:32:46 - INFO - main - ***** Eval results *****
05/20/2022 19:32:46 - INFO - main - acc = 0.944
05/20/2022 19:32:46 - INFO - main - auc = 0.988568
05/20/2022 19:32:46 - INFO - main - f1 = 0.9439997759991039
05/20/2022 19:32:46 - INFO - main - mcc = 0.8880071040852491
05/20/2022 19:32:46 - INFO - main - precision = 0.9440071041136658
05/20/2022 19:32:46 - INFO - main - recall = 0.944

Then drop significantly and give the warning:

/home/maborageh/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/maborageh/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars
mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
05/20/2022 21:09:44 - INFO - main - ***** Eval results *****
05/20/2022 21:09:44 - INFO - main - acc = 0.5
05/20/2022 21:09:44 - INFO - main - auc = 0.59788
05/20/2022 21:09:44 - INFO - main - f1 = 0.3333333333333333
05/20/2022 21:09:44 - INFO - main - mcc = 0.0
05/20/2022 21:09:44 - INFO - main - precision = 0.25
05/20/2022 21:09:44 - INFO - main - recall = 0.5

I saw that some updates were made as mentioned in #10 but I'm still facing this issue. I would appreciate any feedback from you.

Kind regards,
Salah

commented

Hi developers

I am facing the same issues as @mosala777. Here is part of the stdout I observed:

  • INFO - main - Loading features from cached file /home/weiyuan/Desktop/rbp/model/HNRNPA1/data_HNRNPA1/cached_dev_DNABERT3_101_dnaprom
    12/20/2022 14:46:07 - INFO - main - ***** Running evaluation *****
    12/20/2022 14:46:07 - INFO - main - Num examples = 5350
    12/20/2022 14:46:07 - INFO - main - Batch size = 32
    Evaluating: 100%|█████████████| 168/168 [03:38<00:00, 1.30s/it]
    12/20/2022 14:49:46 - INFO - main - ***** Eval results *****luating: 100%|█████████████| 168/168 [03:38<00:00, 1.01it/s]
    12/20/2022 14:49:46 - INFO - main - acc = 0.7816822429906543
    12/20/2022 14:49:46 - INFO - main - auc = 0.8839985990874502
    12/20/2022 14:49:46 - INFO - main - f1 = 0.7782037084362665
    12/20/2022 14:49:46 - INFO - main - mcc = 0.5846169035386265
    12/20/2022 14:49:46 - INFO - main - precision = 0.8024470439531897
    12/20/2022 14:49:46 - INFO - main - recall = 0.7825097242114136
    /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
    warnings.warn("To get the last learning rate computed by the scheduler, "
    {"eval_acc": 0.7816822429906543, "eval_f1": 0.7782037084362665, "eval_mcc": 0.5846169035386265, "eval_auc": 0.8839985990874502, "eval_precision": 0.8024470439531897, "eval_recall": 0.7825097242114136, "learning_rate": 0.0001956160743938891, "loss": 0.4473942193388939, "step": 400}
    12/20/2022 14:57:46 - INFO - main - Loading features from cached file /home/weiyuan/Desktop/rbp/model/HNRNPA1/data_HNRNPA1/cached_dev_DNABERT3_101_dnaprom
    12/20/2022 14:57:46 - INFO - main - ***** Running evaluation *****
    12/20/2022 14:57:46 - INFO - main - Num examples = 5350
    12/20/2022 14:57:46 - INFO - main - Batch size = 32
    Evaluating: 100%|█████████████| 168/168 [03:16<00:00, 1.17s/it]
    /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior.
    _warn_prf(average, modifier, msg_start, len(result))
    /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:900: RuntimeWarning: invalid value encountered in double_scalars
    mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
    12/20/2022 15:01:03 - INFO - main - ***** Eval results *****
    12/20/2022 15:01:03 - INFO - main - acc = 0.4968224299065421
    12/20/2022 15:01:03 - INFO - main - auc = 0.4861425095900458
    12/20/2022 15:01:03 - INFO - main - f1 = 0.3319180819180819
    12/20/2022 15:01:03 - INFO - main - mcc = 0.0
    12/20/2022 15:01:03 - INFO - main - precision = 0.24841121495327104
    12/20/2022 15:01:03 - INFO - main - recall = 0.5
    /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
    warnings.warn("To get the last learning rate computed by the scheduler, "
    {"eval_acc": 0.4968224299065421, "eval_f1": 0.3319180819180819, "eval_mcc": 0.0, "eval_auc": 0.4861425095900458, "eval_precision": 0.24841121495327104, "eval_recall": 0.5, "learning_rate": 0.0001889737628694786, "loss": 0.5985230031609535, "step": 500}

I would appreciate any feedback too, thank you.

Best Regards
WY

commented

The same problem is discussed here: ThilinaRajapakse/simpletransformers#234

The solutions seem to be:

  • lower learning rate (--learning_rate 2e-5)
  • use smaller batch sizes (--per_gpu_train_batch_size 64)
  • perhaps to delete cache (--overwrite_cache)

Edited: some suggested parameters for what I used to get a smooth run

Hi @CherWeiYuan I am facing a similar issue mentioned above and I tried your solution but I am still not seeing any improvement in the results.

Hi, @NikitaBhandare! This solution worked for me, but I had to manually delete all models from ~/.cache/huggingface/hub/ and set seeds for numpy, torch, cuda, etc. before loading a new model. Did you try this?
P.S. Looks like the lr=2e-4 is an error, it should definitely be 2e-5