yao8839836 / COT

Concept over time: the combination of probabilistic topic model with wikipedia knowledge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COT

The datasets and code of this paper:

Liang Yao, Yin Zhang, Baogang Wei, Lei Li, Fei Wu, Peng Zhang, and Yali Bian. "Concept over time: the combination of probabilistic topic model with wikipedia knowledge." Expert Systems with Applications 60 (2016): 27-38.

Dataset

3158 TechCrunch blogs are in data/TechCrunch 1 year (3,158 docs)/datablog/.

6778 New York Times 2011 global news are in data/NYT/.

Timestamp

TechCrunch: /data/TechCrunch 1 year (3,158 docs)/time.txt

NYT: /file/boc/time(NYT).txt, also can be found in /file/doclist(NYT)part.txt

pre-processed files after Wikification

TechCrunch: file/boc/wikified(dense)/

NYT: file/boc/wikified(NYT)/

Wikipedia articles' view statistics of each month

TechCrunch: file/boc/views(tech).txt

NYT: file/boc/views(NYT).txt

Implementation

/src/cot/COT.java is the implementation of the first variation (TOT + link + view).

About

Concept over time: the combination of probabilistic topic model with wikipedia knowledge


Languages

Language:Java 100.0%