FINDER (FInding key players in complex Networks through DEep Reinforcement learning)

This is a TensorFlow implementation of FINDER, as described in our paper:

Fan, C., Zeng, L., Sun, Y and Liu Y-Y. Finding key players in complex networks through deep reinforcement learning. Nat Mach Intell (2020).

Overview
Repo Contents
System Requirements
Installation Guide
Reproduction instructions
Basebline methods implementation

Overview

Finding an optimal set of nodes, called key players, whose activation (or removal) would maximally enhance (or degrade) certain network functionality, is a fundamental class of problems in network science. Potential applications include network immunization, epidemic control, drug design, and viral marketing. Due to their general NP-hard nature, those problems typically cannot be solved by exact algorithms with polynomial time complexity. Many approximate and heuristic strategies have been proposed to deal with specific application scenarios. Yet, we still lack a unified framework to efficiently solve this class of problems. Here we introduce a deep reinforcement learning framework FINDER, which can be trained purely on small synthetic networks generated by toy models and then applied to a wide spectrum of influencer finding problems. Extensive experiments under various problem settings demonstrate that FINDER significantly outperforms existing methods in terms of solution quality. Moreover, it is several orders of magnitude faster than existing methods for large networks. The presented framework opens up a new direction of using deep learning techniques to understand the organizing principle of complex networks, which enables us to design more robust networks against both attacks and failures.

Repo Contents

code: source code of FINDER for the following four cases in the paper.
- FINDER_CN: source code for the Critical Node (CN) problem under the node-unweighted scenario.
- FINDER_CN_cost: source code for the Critical Node (CN) problem under the node-weighted (including degree-based costs and random costs) scenarios.
- FINDER_ND: source code for the Network Dismantling (ND) problem under the node-unweighted scenario.
- FINDER_ND_cost: source code for the Network Dismantling (ND) problem under the node-weighted (including degree-based costs and random costs) scenarios.
results: results obtained by FINDER for all the cases, which should be the same as are reported in the paper.

System Requirements

Software dependencies and operating systems

Software dependencies

Users should install the following packages first, which will install in about 5 minutes on a machine with the recommended specs. The versions of software are, specifically:

cython==0.29.13 
networkx==2.3 
numpy==1.17.3 
pandas==0.25.2 
scipy==1.3.1 
tensorflow-gpu==1.14.0 
tqdm==4.36.1

Operating systems

The package development version is tested on Linux and Windows 10 operating systems. The developmental version of the package has been tested on the following systems:

Linux: Ubuntu 18.04
Windows: 10

The pip package should be compatible with Windows, and Linux operating systems.

Before setting up the FINDER users should have gcc version 7.4.0 or higher.

Hardware Requirements

The FINDER model requires a standard computer with enough RAM and GPU to support the operations defined by a user. For minimal performance, this will be a computer with about 4 GB of RAM and 16GB of GPU. For optimal performance, we recommend a computer with the following specs:

RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core GPU: 16+ GB

The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 cores@3.3 GHz) and internet of speed 25 Mbps.

Installation Guide

Instructions

First install all the above required packages, which are contained in the requirements.txt file

pip install -r requirements.txt

Make all the file

python setup.py build_ext -i

Typical install time

It took about 5 mins to install all the required packages, and about 1 mins to make all the files.

Reproduction instructions

Instructions to run

Train the model,

CUDA_VISIBLE_DEVICES=gpu_id python train.py

Modify the hyper-parameters in FINDER.pyx to tune the model, and make files after the the modification.

Test synthetic data,

CUDA_VISIBLE_DEVICES=-1 python testSynthetic.py (do not use GPU for test)

Using the well-trained model (stored in ./models), you can obtain the results reported in the paper.

Test real data,

CUDA_VISIBLE_DEVICES=-1 python testReal.py (do not use GPU for test)

Using the well-trained model (stored in ./models), you can obtain the results reported in the paper.

Expected output

The experimental results are saved in the results folder, which contains four subfolders, each of which corresponds to one model, and the synthetic and real results are separated into two different subfolders for the sake of clearity.

Expected run time

It took about 17 hours to obtain all results, including 'FINDER_CN', 'FINDER_CN_cost', 'FINDER_ND' and 'FINDER_ND_cost' four models, on both synthetic data and real data, containing both node uniform weights, degree-based weights and random weights.

Basebline methods implementation

We compared with HDA, HBA, HCA, HPRA, CI, MinSum, BPD, GND and RatioCut, which are state-of-the-art baselines for network key players finding methods.

We ran HDA, HBA, HCA, and HPRA with Networkx 2.0, and for HDA in large real-world networks, we instead used SNAP, a general-purpose, high-performance system for graph analysis.

http://snap.stanford.edu/snap

We used the source codes released online, and adopted the best parameter settings for each method. For RatioCut, we modified the traditional RatioCut method based on the GND implementation.

https://github.com/zhfkt/ComplexCi (CI)
https://github.com/abraunst/decycler (MinSum)
http://power.itp.ac.cn/~zhouhj/codes.html (BPD)
https://github.com/hcmidt/corehd (CoreHD)
https://github.com/renxiaolong/Generalized-Network-Dismantling (GND and RatioCut)

Reproducing the results that reported in the paper

Here is the link to the dataset that was used in the paper, including: 1) real data: nine different test data collected from the SNAP repository; 2) synthetic data: generated by Barab'{a}si-Albert (BA) model.

https://drive.google.com/open?id=1HAxIUsgOPYXHDikmlTIKIXunGWWLXbr2

Reference

Please cite our work if you find our code/paper is useful to your work.

@article{fan2020finding,
   title={Finding key players in complex networks through deep reinforcement learning},
   author={Fan, Changjun and Zeng, Li and Sun, Yizhou and Liu, Yang-Yu},
   journal={Nature Machine Intelligence},
   pages={1--8},
   year={2020},
   publisher={Nature Publishing Group}
 }

fengxu-pku / FINDER