center-for-threat-informed-defense / tram

TRAM is an open-source platform designed to advance research into automating the mapping of cyber threat intelligence reports to MITRE ATT&CK®.

Home Page:https://ctid.mitre-engenuity.org/our-work/tram/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need help: BERT Fine Tuning

abhishekdhiman25 opened this issue · comments

Hi Reader

I have installed TRAM using developer's setup guide in windows system.
I want to fine tune SciBERT using "fine_tune_multi_label.ipynb" notebook with my own data.
I want to know do i need to change the classes in according to my data.
Actually i have prepared training data for all 537 ATT&CK labels in similar format as of "multi_label.json".
Is it is necessary to change the Classes in 2nd cell code of "fine_tune_multi_label.ipynb", If Yes how it can be done , is there a particular format for this.
My colleague tried this earlier and got some error related to out_features of bert model set to 50 , so he set it to 537 but accuracy score dropped to zero.

Please tell me some way how can i fine tune it on my data with larger number of ATT&CK labels more than 50.

For reference 2nd cell Code:
from sklearn.preprocessing import MultiLabelBinarizer as MLB

CLASSES = [
'T1003.001', 'T1005', 'T1012', 'T1016', 'T1021.001', 'T1027',
'T1033', 'T1036.005', 'T1041', 'T1047', 'T1053.005', 'T1055',
'T1056.001', 'T1057', 'T1059.003', 'T1068', 'T1070.004',
'T1071.001', 'T1072', 'T1074.001', 'T1078', 'T1082', 'T1083',
'T1090', 'T1095', 'T1105', 'T1106', 'T1110', 'T1112', 'T1113',
'T1140', 'T1190', 'T1204.002', 'T1210', 'T1218.011', 'T1219',
'T1484.001', 'T1518.001', 'T1543.003', 'T1547.001', 'T1548.002',
'T1552.001', 'T1557.001', 'T1562.001', 'T1564.001', 'T1566.001',
'T1569.002', 'T1570', 'T1573.001', 'T1574.002'
]

mlb = MLB(classes=CLASSES)
mlb.fit([[c] for c in CLASSES])

mlb

I think this is answered in #216. Please re-open if there's something I missed.