jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to get the high attention regions of a given sequence.

ytye2010 opened this issue · comments

I want to extract those high attention regions for some given sequences. I have try #11 to get the last embedding vector for each token.
After take their means, I find some values are negative. Is it right? And how to compare these values? Larger those absolute values, higher attention?
The following is my try for your example in #11 using 6-new-12w-0 downloaded from this github.
sequence = "AATCTA ATCTAG TCTAGC CTAGCA"
output[0][0].mean(1) = [0.0017, -0.0006, -0.0032, -0.0047, 0.0022, -0.0069]
As my understand, the first 0.0017 and last -0.0069 stand for [CLS] and [SEP]. Right?