AdrienBenamira / Link_prediction_graph

Kaggle Competition: Predicting Missing Citations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

General

In this work we address the task of link prediction in a citation network. This work is also a part of an in-class Kaggle Competition for Network Course Analytics Course offered at Ecole CentraleSupelec, Paris in Fall 2018-2019.

Our final F-score is 0.973 on the public test set and we are currently ranked 2nd / 46.

Set up

Put glove folder in the dataset path Config default

run main.py

Features

We have :

* overlap_title,
* temp_diff,
* comm_auth,
* num_inc_edges,
* Distance_abstract,
* Distance_title,
* shortest_path_dijkstra
* shortest_path_dijkstra_und
* comm_neighbors,
* no_edge,
* tfidf_distance_corpus,
* tfidf_distance_titles,
* jaccard_und
* Resource_allocation

Results and report

Report is available here

Model Train Validation
Gradient Boosting 0.979 0.976
Random Forest 1 0.975
SVM 0.964 0.964
Linear 0.966 0.966

Extra :

Model_tunning.ipynb and Features.ipynb analyse our results

Alt text

Alt text

Alt text

Alt text

About

Kaggle Competition: Predicting Missing Citations


Languages

Language:Jupyter Notebook 91.0%Language:Python 9.0%