Pretrained BERT model?

Question

Pretrained BERT model?

priamai opened this issue a year ago · comments

Hi there,
I have just deployed the last version via docker and noticed that there are only 2 models pre trained.

It would useful to know how to:
a) train the SCIBERT on some annotated dataset (the link is broken I guess is a private repo)?
b) download a pre-trained SCIBERT

Cheers!
@mehaase

Mark E. Haase · Answer 1 · Tue Sep 05 2023 20:34:35 GMT+0800 (China Standard Time)

Hi @priamai, that screen is a bit misleading. It is showing stats for the models that were trained inside the container; the SciBERT model is trained outside the container (by us, on high-end GPUs) and downloaded into the docker container. If you want to fine-tune the model on your own data, we have some jupyter notebooks to facilitate that: https://github.com/center-for-threat-informed-defense/tram/wiki/Large-Language-Models#jupyter-notebooks

(I also fixed the broken link that you were looking at: https://github.com/center-for-threat-informed-defense/tram/wiki/Data-Annotation)

priamai · Answer 2 · Tue Sep 05 2023 21:16:46 GMT+0800 (China Standard Time)

Hi @mehaase but when I upload a report it doesn't let me choose the mode, so does it default to the SCIBERT?
Thanks for fixing the link!
I love the colabo books so we can fine tune for free on Colab!

Mark E. Haase · Answer 3 · Tue Sep 05 2023 21:42:31 GMT+0800 (China Standard Time)

Yes it defaults to scibert. The choice of model is specified entrypoint.sh.