Lab exercise in Advanced Topics in Distributed Systems: Sampling Massive Online Graphs --- Data source - http://snap.stanford.edu/data/p2p-Gnutella31.html - or ./download_data.sh --- Exercise Implement graph size estimation using the following sampling techniques: (a) Uniform Independent Sample Without Replacement, (b) Uniform Independent Sample With Replacement, (c) Weighted Independent Sample With Replacement (with weights equal to node degrees), (d) Random Walk (Metropolis-Hastings and Reweighted) 1) using all nodes, 2) every k-th node, 3) the weights are node degrees What do you observe? --- Requirements 1) Python 2) networkx (easy_install networkx) 3) gnuplot --- Where to start? Each sampling technique is contained in its own file (except UIS_WOR which doesn't make sense since there wont be any collisions). For example, to start a full MHRW sample run: python mhrw_sample.py UIS_WR and WIS_WR samples are self-contained. Run as: python uis_wr.py or python wis_wr.py RWRW is broken. --- Generate graphs Run 'gnuplot generate.graph' to create some (good looking) graphs.