center-for-threat-informed-defense / tram

TRAM is an open-source platform designed to advance research into automating the mapping of cyber threat intelligence reports to MITRE ATT&CK®.

Home Page:https://ctid.mitre-engenuity.org/our-work/tram/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pretrained BERT model?

priamai opened this issue · comments

Hi there,
I have just deployed the last version via docker and noticed that there are only 2 models pre trained.

image

It would useful to know how to:
a) train the SCIBERT on some annotated dataset (the link is broken I guess is a private repo)?
b) download a pre-trained SCIBERT

Cheers!
@mehaase

Hi @priamai, that screen is a bit misleading. It is showing stats for the models that were trained inside the container; the SciBERT model is trained outside the container (by us, on high-end GPUs) and downloaded into the docker container. If you want to fine-tune the model on your own data, we have some jupyter notebooks to facilitate that: https://github.com/center-for-threat-informed-defense/tram/wiki/Large-Language-Models#jupyter-notebooks

(I also fixed the broken link that you were looking at: https://github.com/center-for-threat-informed-defense/tram/wiki/Data-Annotation)

Hi @mehaase but when I upload a report it doesn't let me choose the mode, so does it default to the SCIBERT?
Thanks for fixing the link!
I love the colabo books so we can fine tune for free on Colab!

Yes it defaults to scibert. The choice of model is specified entrypoint.sh.