pcoving / KDDCup

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KDDCup 2013

Details of the competition can be found here.

To keep the repo lightweight, the dataset does not ship with the code. The .csv data can be downloaded from Kaggle (requires account) and untarred in the top-level directory.

Some benchmarks require the scikit-learn package.

Theory

Semi-supervised learning review:

The competition appears to be an instance of bipartite ranking:

Personalized PageRank with Monte Carlo looks promising:

Ideas

  • Build features with link analysis on author/paper graph, possibly with NetworkX library (doesn't seem to scale, looks like we need our own implementation)
  • How to use titles, keywords, affliction and other raw text features?

About


Languages

Language:Python 100.0%