mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Home Page:https://mimno.github.io/Mallet/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error with assigning initial topics in ParallelTopicModel.java

clause opened this issue · comments

I think there's an issue with assigning initial topics. On Line 248 the loop index is the size of the topics array. It should be the length of the token sequence. Because the topic array has a minimum capacity (currently 2) there are always at least some topics added, even if the document has fewer than 2 tokens.

commented

Thank you for spotting this! I fixed this and a few other instances in the topic model. I'm not sure the extra topics were ever being used for anything except reports, but I'm glad this is fixed.