An implementation of existing cohesive subgraph discovery algorithm utilizing NetworkX.
It consists of several cohesive subgraph models including core-based, truss-based, clique-based, and others.
-
$k$ -core [1] -
$(k,s)$ -core [2] -
$(k,h)$ -core [3] -
$(k,p)$ -core [4] -
$k$ -peak [5]
-
$k$ -truss [6] -
$k$ -tripeak [7]
-
$k$ -distance clique [8] - max
$k$ -clique [8] -
$k$ -clan [9] -
$k$ -club [9]
-
$k$ -VCC($k$ -vertex connected component ) [10,11] -
$k$ -ECC($k$ -edge connected component) [12]
- Alphacore [13]
-
$k$ -core-truss [14] - SCAN [15]
|
|
Algorithms | parameter | example |
---|---|---|
|
|
simple toy network version 1 |
|
|
[2] Fig. 2. |
|
|
[3] Fig. 1. |
|
|
[4] Fig. 1. |
|
|
simple toy network version 1 |
|
|
simple toy network version 1 |
|
|
[7] Fig. 2. |
max |
|
simple toy network version 2 |
|
|
simple toy network version 2 |
|
|
simple toy network version 1 |
|
|
simple toy network version 1 |
Alphacore |
|
simple toy network version 1 |
|
|
[14] Fig. 2. |
SCAN |
|
simple toy network version 1 |
The following plots present the results of the cohesive subgrpah model by utilizing the above parameters and networks.
( |
( |
|
---|---|---|
( |
||
max |
||
Alphacore | ||
SCAN | ||
Note that
k-distance clique | |
Solution 1 | Solution 2 |
- networkx = 2.7.1
- pandas = 1.1.3
- scipy = 1.4.1
- cdlib == 0.2.6
- numpy == 1.21
Clone repo and install requirements.txt in a Python =3.8 environment.
git clone https://github.com/cscnudi/CohesiveSubgraph
cd CohesiveSubgraph
pip install -r requirements.txt
- Usage (example script. please run the following code under the
css
folder)
import networkx as nx
from css import alg_kcore, alg_alphacore, alg_kecc, alg_khcore, alg_kpcore, alg_kpeak, alg_kscore, alg_ktruss,alg_kvcc
from css import alg_kdistance_clique, alg_ksize_clique, alg_ktripeak, alg_kcoretruss, alg_scan, common_utility
G = nx.karate_club_graph() # input graph
C = alg_kcore.run(G, k=3) # run k-core
common_utility.print_result(G, C)
C = alg_kscore.run(G, k=3, s=2) # run (k,s)-core
common_utility.print_result(G, C)
C = alg_khcore.run(G, k=5, h=2) # run (k,h)-core
common_utility.print_result(G, C)
C = alg_kpcore.run(G, k=3, p=0.5) # run (k,p)-core
common_utility.print_result(G, C)
C = alg_kpeak.run(G, k=3) # run k-peak
common_utility.print_result(G, C)
C = alg_ktruss.run(G, k=4) # run k-truss
common_utility.print_result(G, C)
C = alg_ktripeak.run(G, k=4) # run k-tripeak
common_utility.print_result(G, C)
C = alg_ksize_clique.run(G, k=5) # max k-clique
common_utility.print_result(G, C)
C = alg_kdistance_clique.run(G, k=5) # run k-distance clique
common_utility.print_result(G, C)
C = alg_kvcc.run(G, k=3) # run k-VCC
common_utility.print_result(G, C)
C = alg_kecc.run(G, k=5) # run k-ECC
common_utility.print_result(G, C)
C = alg_alphacore.run(G, alpha=0.1) # run Alphacore
common_utility.print_result(G, C)
C = alg_kcoretruss.run(G, k=5, alpha=1) # run k-core-truss
common_utility.print_result(G, C)
C = alg_scan.run(G, k=3, epsilon=0.5) # run scan
common_utility.print_result(G, C)
- Running the algorithm with user-specificed paramters in console.
python run.py --algorithm kcore # run k-core
--k 3 # parameter k value
--network example.dat # network edges file
python run.py --algorithm kscore # run (k,s)-core
--k 3 # parameter k value
--s 2 # parameter s value
--network example.dat # network edges file
python run.py --algorithm khcore # run (k,h)-core
--k 5 # parameter k value
--h 2 # parameter h value
--network example.dat # network edges file
python run.py --algorithm kpcore # run (k,p)-core
--k 3 # parameter k value
--p 0.5 # parameter h value
--network example.dat # network edges file
python run.py --algorithm kpeak # run k-peak
--k 3 # parameter k value
--network example.dat # network edges file
python run.py --algorithm ktruss # run k-truss
--k 4 # parameter k value
--network example.dat # network edges file
python run.py --algorithm ktripeak # run k-tripeak
--k 4 # parameter k value
--network example.dat # network edges file
python run.py --algorithm maxkclique # run max k-clique
--k 5 # parameter k value
--network example.dat # network edges file
python run.py --algorithm kdistanceclique # run k-distance clique
--k 5 # parameter k value
--network example.dat # network edges file
python run.py --algorithm kvcc # run k-VCC
--k 3 # parameter k value
--network example.dat # network edges file
python run.py --algorithm kecc # run k-ECC
--k 5 # parameter k value
--network example.dat # network edges file
python run.py --algorithm alphacore # run Alphacore
--alpha 0.1 # parameter alpha value
--network example.dat # network edges file
python run.py --algorithm kcoretruss # run k-core-truss
--k 5 # parameter k value
--alpha 1 # parameter alpha value
--network example.dat # network edges file
python run.py --algorithm scan # run scan
--k 3 # parameter k value
--epsilon 0.5 # parameter epsilon value
--network example.dat # network edges file
The datasets are publicly available. You can download the datasets by checking the following references
- Karate [16]
- Polblogs [17]
- Amazon [18]
- DBLP [18]
- Youtube [18]
- Weibo [19]
- Livejournal [20]
- Yelp [21]
[1] Stephen B Seidman. 1983. Network structure and minimum degree. Social networks 5, 3 (1983), 269–287.
[2] Fan Zhang, Long Yuan, Ying Zhang, Lu Qin, Xuemin Lin, and Alexander Zhou. 2018. Discovering strong communities with user engagement and tie strength. DASFAA. Springer, 425–441
[3] Francesco Bonchi, Arijit Khan, and Lorenzo Severini. 2019. Distance-generalized core decomposition. SIGMOD. 006–1023.
[4] Chen Zhang, Fan Zhang, Wenjie Zhang, Boge Liu, Ying Zhang, Lu Qin, and Xuemin Lin. 2020. Exploring finer granularity within the cores: Efficient (k, p)-core computation. ICDE. IEEE, 181–192.
[5] Priya Govindan, Chenghong Wang, Chumeng Xu, Hongyu Duan, and Sucheta Soundarajan. 2017. The k-peak decomposition: Mapping the global structure of graphs. WWW. 1441–1450.
[6] Jonathan Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National security agency technical report 16, 3.1 (2008).
[7] Xudong Wu, Long Yuan, Xuemin Lin, Shiyu Yang, and Wenjie Zhang. 2019. Towards efficient k-tripeak decomposition on large graphs. DASFAA. Springer, 604–621.
[8] Balabhaskar Balasundaram, Sergiy Butenko, and Svyatoslav Trukhanov. 2005. Novel approaches for analyzing biological networks. J. Comb. Optim. 10, 1 (2005), 23–39.
[9] Shahram Shahinpour and Sergiy Butenko. 2013. Algorithms for the maximum k-club problem in graphs. J. Comb. Optim. 26, 3 (2013), 520554
[10] James Moody and Douglas R White. 2002. Social cohesion and embeddedness: A hierarchical conception of social groups. Sociological Methodology (2002), 365–368
[11] Jordi Torrents and Fabrizio Ferraro. 2015. Structural cohesion: Visualization and heuristics for fast computation. arXiv preprint arXiv:1503.04476 (2015)
[12] Tianhao Wang, Yong Zhang, Francis YL Chin, Hing-Fung Ting, Yung H Tsin, and Sheung-Hung Poon. 2015. A simple algorithm for finding all k-edge-connected components. Plos one 10, 9 (2015), e0136264.
[13] Friedhelm Victor, Cuneyt G Akcora, Yulia R Gel, and Murat Kantarcioglu. 2021. Alphacore: Data Depth based Core Decomposition. SIGKDD. 1625–1633.
[14] Zhenjun Li, Yunting Lu, Wei-Peng Zhang, Rong-Hua Li, Jun Guo, Xin Huang, and Rui Mao. 2018. Discovering hierarchical subgraphs of k-core-truss. DSE 32 (2018)136–149
[15] Xu, Xiaowei, et al. "Scan: a structural clustering algorithm for networks." SIGKDD. 2007.
[16] Wayne W Zachary. 1977. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 4 (1977), 452–473.
[17] Lada A Adamic and Natalie Glance. 2005. The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery. ACM, New York, NY, USA, 36–43.
[18] Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. KAIS 42, 1 (2015), 181–213
[19] Li, Guoliang, et al. "Efficient location-aware influence maximization." Proc. ACM SIGMOD Int. Conf. Manag. Data. 2014.
[20] J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012.
[21] Y. Dataset, “Yelp restaurant dataset.” [Online]. Available: http://yelp.com/ (Yelp Dataset Challenge Round 11)
Please feel free to contact @Dahee Kim (dahee@unist.ac.kr) if you have any questions about the implementation.
We really appreciate all the authors of the papers [1-16], networkX and Priya Govindan. We utilize the code (https://github.com/priyagovindan/kpeak) for k-peak computation. We tried to contact to the first author (Priya Govindan), but failed to contact to her. To get the full code with visualization, please check the above github link. We sincerely thank to Priya Govindan to share the code.