donalee / taxocom

Topic taxonomy completion with hierarchical discovery of novel topic clusters

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TaxoCom: A Framework for Topic Taxonomy Completion

Overview

The overview of the TaxoCom framework which discovers the complete topic taxonomy by the recursive expansion of the given topic hierarchy. Starting from the root node, it performs (1) locally discriminative embedding and (2) novelty adaptive clustering, to selectively assign the terms (of each node) into one of the child nodes.

Run the codes

STEP 1. Install the python libraries / packages

  • python
  • numpy, scipy
  • spherecluster
  • sklearn 0.21 (for the compatibility with spherecluser)

STEP 2. Download the dataset

  • Download the datasets from the following links, then place them in ./data/nyt and ./data/arxiv, respectively.

STEP 3. Execute the TaxoCom framework

  • Run the codes by using the following commands
cd code
bash run_taxocom.sh <dataset-name> <seed-taxo-name>
  • For example, the downloaded nyt directory can be simply used by
bash run_taxocom.sh nyt seed_taxo

About

Topic taxonomy completion with hierarchical discovery of novel topic clusters

License:GNU General Public License v3.0


Languages

Language:C 60.1%Language:Python 39.1%Language:Shell 0.8%