jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pretraining DNABERT

iamakshay1 opened this issue · comments

Hi, I am actually pretraining the DNAbert model on my custom data, and I am getting these as the perplexity score : 1.005971908569336
1.0055251121520996
1.0050606727600098
1.005359411239624
1.005840539932251
1.0052825212478638
1.0051802396774292
1.0055253505706787
1.0054320096969604
1.0054410696029663
1.0058468580245972
1.0049262046813965
1.0057575702667236
1.0051915645599365
1.0054072141647339
1.0055351257324219
1.0054702758789062
1.0053589344024658
1.0051729679107666
1.005456566810608
1.0054833889007568
1.0049924850463867
1.0052168369293213
1.0055359601974487
1.0054214000701904
1.0054751634597778
1.005573034286499
1.0051946640014648
1.0053223371505737
1.0050946474075317
1.0055451393127441
1.0052800178527832
1.0052553415298462
1.005454421043396
1.0052385330200195
1.0048243999481201
1.005685806274414
1.0053269863128662
1.0049481391906738
1.0052223205566406
1.0053377151489258
1.0051454305648804
1.0050266981124878
1.005757451057434
1.005202054977417
1.005906343460083
1.0050561428070068
1.0051881074905396
1.0052803754806519
1.0053002834320068
1.005397915840149
1.0059492588043213
1.0059244632720947
1.0054737329483032
1.00540030002594
1.0050368309020996
1.0050461292266846
1.005406141281128
1.005310297012329
1.0049501657485962
1.0049052238464355
1.005474328994751
1.0050350427627563
1.0050352811813354
1.0047125816345215
1.0053828954696655
1.0057741403579712
1.0050772428512573
1.0055228471755981
1.0052945613861084
1.005362868309021
1.0057356357574463
1.0052978992462158

I am actually new to ML-AI domain so was little confused about the bounds of perplexity score, can you please help in validating the score?

Hi,

@Zhihan1996 can comment on the perplexity score. Please be more specific on how you would want validation so that we could help you. Will close for now but happy to continue the discussion.

Thanks,
Jerry

scores
Untitled
Hi, I have attached the actual scores and plot which I have got when I pretrained DNABERT on my data. So I just wanted to know the optimal values for the perplexity score and if possible, then can you please share with me the perplexity scores of your corpus