HongyuGong / Document-Similarity-via-Hidden-Topics

Document Similarity for Texts of Varying Lengths via Hidden Topics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question --> computing relevance VS reconstruction error

drwi opened this issue · comments

commented

Dear Sir,
This really is not an issue, but rather a technical question / to better understand the scientific choices behind the code you proposed.

I wonder whether there is a reason why the whole document reconstruction error is computed as [the sum of residuals between words and reconstructed words]

whereas the relevance is computed as
[the average cosine similarity between summary words and reconstructed summary words].

Is it mainly to ease numerically solving for topic extraction (computing the best H) whereas relevance should not depend on the norm of {word vectors} (or should it )? Would you have any clue that would enable to underpin this choice?

Have a great day,
Yours sincerely,
Drwi