manuelsh / topic-identification-and-evaluation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A methodology for topic evaluation and topic identification on unsupervised topic models

Repository for paper "A methodology for topic evaluation and topic identification on unsupervised topic models".

Main file is the notebook: method.ipynb

Libraries versions:

  • torchtext: 0.2.3
  • gensim: 3.6.0
  • sklearn: 0.20.0

Abstract

A methodology for evaluating unsupervised topic models and identifying topics is presented. It is based in the resulting performance when using the unsupervised model as a classifier on a small labeled data set. The methodology (1) is simple and requires little computational work, (2) does not rely on an external source such as Wikipedia or Google, (3) is fully automated, without requiring human intervention, (4) solves both model evaluation and topic assignation problems and (5) is suitable for any type of unsupervised topic model, not only for those that are defined by a distribution over words, such as LDA. We exemplify the use of the methodology by evaluating a set of LDA models over the AG News corpus.

About

License:MIT License


Languages

Language:Jupyter Notebook 67.2%Language:Python 32.8%