MLEFlow/Torflow/sbws implementation Guide
- Reproducing some of the figures in the paper "MLEFlow: Learning from History to Improve Load Balancing in Tor"
Github Repo: https://github.com/hdarir2/mleflow
Dependencies: Python 3.6 or newer versions numpy scipy matplotlib bisect math lambertw sorted containers decimal
Step 1: Import the ground truth cpaacities of the relays considered
Step 2: Run blocks 1 to 12 in the notebook in order to define the required weight vectors, path generating functions (one relay and three relays paths), observation calculation, discretization of the capacities set.
Step 3: Run the algorithm of choice.
Notation: -lambdas= rate of arrival of users to the Tor network -t= number of epochs -w0= initial weight vector estimate -noise= 0 (if the observation we want to consider are noiseless); 1 (if the observations are noisy= observation multiplied by a normal random variable ~N(1,0.5) truncated between 0.7 and 1.3) -bw= vector containing the capacity of each relay in the network -bwguard= vector containing the capacities of all relays that can be used in first position -bwmiddle= vector containing the capacities of the relays that can only be used in the second position (without taking into consideration the guard relays that can also be used in the second position) -bwmiddle2= vector containing the capacities of all the relays that can be used in the second poisition -bwexit= vector containing the capacities of all relays that can be used in the third position
Algorithms for one relay paths networks:
- Torflow_P: torflow_onerelay function; run with and without noise.
- MLEFlow_Q: stochasticpoisson_onerelay function; run with and without noise.
- MLEFlow_CF: iterativepoisson_onerelay function; run with and without noise.
- sbws*: sbws_onerelay function; run with and without noise.
*Error between estimation of each algorithm and ground truth calculated ad plotted in a box plot. *Using estimates, generate paths of one relays; compute their bandwidths using circ_bw and compare.
Algorithms for one relay paths networks:
- Torflow_P: torflow function; run with and without noise.
- MLEFlow_Q: stochasticpoisson function; run with and without noise.
- MLEFlow_CF: iterativepoisson function; run with and without noise.
- sbws*: sbws function; run with and without noise.
*Error between estimation of each algorithm and ground truth calculated ad plotted in a box plot. *Using estimates, generate paths of one relays; compute their bandwidths using circ_bw and compare.
- Modified Shadow-Tor plugin.
- Modified Tor binary with MLEFlow estimation algorithm.
- Configuration files for MLEFlow and Torflow simulation.
If you don't have a working Shadow binary yet, download and build the shadow binary following the steps in this repository.
Then follow the following steps to modify and build the custom Tor binary and TorFlow plugin:
- Download the Tor source code from this branch. The directory name of the source code is
tor
. - Download the Tor/TorFlow plugin for shadow from this branch.
- Change to the
shadow-plugin-tor
directory, or whatever directory was downloaded in step 2. - Build and install with the following command:
Note that the path to the Tor binary source code is
./setup build -c -j64 --tor-prefix ../tor/ && ./setup install
../tor/
. Change that to point to wherever you stored the Tor binary source code.
The test environment configurations can be downloaded from this repository. The 3% network configuration used in the paper is under the shadowtor-0.03-a-control-2kclients-allrelays-config
directory. The name of the directory can be understood this way:
0.03
means it's a 3% network.control
means it uses the MLE algorithm.2kclients
means the network simulates 2000 clients downloading concurrently.allrelays
means the authorities in the network uses the MLE algorithm to predict the capacity of all (guard, middle, exit) relays, as opposed to only estimating the exit relays.
cd
into the desired config directory and run the following command to start the simulation:
shadow shadow.config.xml > shadow.log
The output of the shadow binary will be stored in shadow.log
. The outputs of the virtual hosts can be found in shadow.data/hosts/[HOSTNAME]
.