center-for-threat-informed-defense / tram

TRAM is an open-source platform designed to advance research into automating the mapping of cyber threat intelligence reports to MITRE ATT&CK®.

Home Page:https://ctid.mitre-engenuity.org/our-work/tram/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need Guidance: I want to fine tune Scibert with enterprise-attack.json, How can i do that?

abhishekdhiman25 opened this issue · comments

Hi Reader,

I am trying the notebook "fine_tune_multi_label.ipynb" (path to notebook: tram\user_notebooks\fine_tune_multi_label.ipynb) for fine tuning scibert.
I saw that it is using multi_label.json with sentence and labels columns as mandatory.
Now i want to use whole data of enterprise-attack.json for testing and fine tuning (path to enterprise-attack.json: tram\data\attack\enterprise-attack.json).
I am doing this in my local windows environment for some testing reasons.
Please correct me if i am wrong or if this is possible or not. If it is possible, what will be the correct way to do it.
Thanks in advance for your help.

This is not possible, or at least not easy to do. The training data for SciBERT uses phrase-level annotations formatted by our data labeling tool (MITRE Annotation Toolkit – https://mat-annotation.sourceforge.net/). If you wanted to train on enterprise ATT&CK, you would need to reformat that data to look like phrase-level annotations. You would also need to update the fine tuning notebook so that it knows about the expanded set of technique IDs that can be used as classes.

Hi @mehaase
Thanks for your support, I really appreciate it.
Can you please tell me if i fine tune scibert using "fine_tune_multi_label.ipynb" notebook, how can i replace default Scibert currently used in TRAM with my fine tuned one. I want to do this for some testing purpose.

Thanks again for your help
Regards,
Abhishek