Where is the corpus of input classified threat reports?

Question

Where is the corpus of input classified threat reports?

Radu3000 opened this issue 2 years ago · comments

MITRE Att&ck knowledge base has already mapped TTPs (and other att&ck objects) to threat reports via so called Citations. Can such a corpus of "classified" threat report texts be made available as part of this repository?

Otherwise how can we measure the accuracy of TRAM?

Regards,
Radu

Mark E. Haase · Answer 1 · Thu Mar 31 2022 01:08:43 GMT+0800 (China Standard Time)

MITRE Att&ck knowledge base has already mapped TTPs (and other att&ck objects) to threat reports via so called Citations. Can such a corpus of "classified" threat report texts be made available as part of this repository?

The repository contains >10k sentences of labeled training data. That data is used to train the models that are built into TRAM. But it is not derived from ATT&CK

(The "procedure examples" data from ATT&CK has not been used in TRAM due to concern that it's not representative of real-world CTI reports, but we are open to feedback on this.)

Otherwise how can we measure the accuracy of TRAM?

Click on the "ML Admin" button and you can browse through each of the models. The model performance is reported using the F1 statistic with a train/test split. (F1 is a bit more useful than accuracy for imbalanced class distribution.) I'm interested in proposals/pull requests to improve the model evaluation, such as reporting precision/recall separately, producing confusion matrices of ATT&CK techniques, etc.

Mark E. Haase · Answer 2 · Thu May 05 2022 20:33:10 GMT+0800 (China Standard Time)

Closing due to inactivity. Please reopen if needed.