MaartenGr / BERTopic_evaluation

Code and experiments for *BERTopic: Neural topic modeling with a class-based TF-IDF procedure*

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trying to reproduce the TC score for Trump dataset

chunfortam opened this issue · comments

Hi Maarten,

I am trying to reproduce the TC score of 0.066 for the Trump dataset with MPNET SBERT models, but I have been getting various results from -0.01x to 0.03 after averaging the 15 runs. I understand there is randomness introduced by UMAP, but I'd like to know if there's more reason for it. I followed the Python notebook and used the same dataset and wondering what's your thought on this.

Regards,
Chun

Did you make sure to use the versions as specified in the notebook? BERTopic, and its dependencies, have gone through several changes over the years which would explain some of the differences.

I think that was it, thanks!