jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the splice sites prediction

AstroSign opened this issue · comments

commented

I am trying to use the pretrained mode on my splice site data but the result is kind of random. In my case, I used the binary classifier(dnaprom) to classify if the splice site is at the middle of the sequence. My false positive samples are generated by randomly selecting not matched donors and acceptors.

I don't know if that's because the format of my data or other reason. Could you please provide the splice site data in your experiments? It will be helpful if you can provide both the 3-class(donor, acceptor and non-splice site) one and the TP splice site dataset.

commented

I am trying to use the pretrained mode on my splice site data but the result is kind of random. In my case, I used the binary classifier(dnaprom) to classify if the splice site is at the middle of the sequence. My false positive samples are generated by randomly selecting not matched donors and acceptors.

I don't know if that's because the format of my data or other reason. Could you please provide the splice site data in your experiments? It will be helpful if you can provide both the 3-class(donor, acceptor and non-splice site) one and the TP splice site dataset.

I encountered the same question as I use the binary classifier to classify my data. The result is random when I use sequences randomly extracted from reference. However, when I use my own positive data, and the negative data from sample_data, I got a better accuracy.