pcoving / KDDCup

KDDCup 2013

Details of the competition can be found here.

To keep the repo lightweight, the dataset does not ship with the code. The .csv data can be downloaded from Kaggle (requires account) and untarred in the top-level directory.

Some benchmarks require the scikit-learn package.

Theory

Semi-supervised learning review:

Semi-Supervised Learning Literature Survey

The competition appears to be an instance of bipartite ranking:

Personalized PageRank with Monte Carlo looks promising:

Monte Carlo methods in PageRank computation

Ideas

Build features with link analysis on author/paper graph, possibly with NetworkX library (doesn't seem to scale, looks like we need our own implementation)
How to use titles, keywords, affliction and other raw text features?

About

Languages

Language:Python 100.0%