archie / graphsize

Course exercise: sampling online graphs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lab exercise in Advanced Topics in Distributed Systems: Sampling Massive Online Graphs

---

Data source
- http://snap.stanford.edu/data/p2p-Gnutella31.html 
- or ./download_data.sh 

---

Exercise

Implement graph size estimation using the following sampling techniques:
(a) Uniform Independent Sample Without Replacement, 
(b) Uniform Independent Sample With Replacement, 
(c) Weighted Independent Sample With Replacement (with weights equal to node degrees), 
(d) Random Walk (Metropolis-Hastings and Reweighted)
    1) using all nodes, 
    2) every k-th node,
    3) the weights are node degrees 

What do you observe?

---

Requirements
1) Python
2) networkx (easy_install networkx)
3) gnuplot

---

Where to start?

Each sampling technique is contained in its own file (except UIS_WOR which doesn't make sense since there wont be any collisions). For example, to start a full MHRW sample run: 
     python mhrw_sample.py

UIS_WR and WIS_WR samples are self-contained. Run as:
     python uis_wr.py or python wis_wr.py

RWRW is broken. 

---  

Generate graphs

Run 'gnuplot generate.graph' to create some (good looking) graphs.

About

Course exercise: sampling online graphs


Languages

Language:Python 98.1%Language:Shell 1.9%