Trying to reproduce the TC score for Trump dataset

Question

Trying to reproduce the TC score for Trump dataset

chunfortam opened this issue a year ago · comments

Hi Maarten,

I am trying to reproduce the TC score of 0.066 for the Trump dataset with MPNET SBERT models, but I have been getting various results from -0.01x to 0.03 after averaging the 15 runs. I understand there is randomness introduced by UMAP, but I'd like to know if there's more reason for it. I followed the Python notebook and used the same dataset and wondering what's your thought on this.

Regards,
Chun

Maarten Grootendorst · Answer 1 · Fri Apr 28 2023 14:59:56 GMT+0800 (China Standard Time)

Did you make sure to use the versions as specified in the notebook? BERTopic, and its dependencies, have gone through several changes over the years which would explain some of the differences.

chunfortam · Answer 2 · Fri May 05 2023 12:18:11 GMT+0800 (China Standard Time)

I think that was it, thanks!