Antonio-Cruciani / TSBProxy

Repository associated to paper "Proxying betweenness in Temporal Networks: A comparative Analysis"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TSBProxy

Repository associated to paper "Proxying Betweenness Centrality Rankings in Temporal Networks"

Network file format

The temporal network file must contain one line for each temporal edge (a temporal edge is a triple (u,v,t), where u and v are two nodes and t is one time in which the edge from u to v is available). All networks are considered as directed: hence, if the graph is undirected, then both the temporal edge (u,v,t) and the temporal edge (v,u,t) have to be included in the file. Nodes can be identified by any string (not necessarily a number), while times have to be integer values. The three elements of a temporal edge can be separated by any string, which can be specified while using the load_temporal_graph function (see below). In the following, we assume that this string is just a space.

The file cannot include duplicate lines (that is, two identical temporal edges) or self-loop lines (that is, temporal edges from a node to itself). In case your file contains duplicate lines or self-loops lines, you should first execute the following bash command.

awk '($1!=$2)' <in_file_name> | sort | uniq > <out_file_name>.txt

You can then use the newly generated file as input to the software.

How to use the software

We assume that the julia REPL has been started within the directory TSBProxy and the following command has been already executed (after having installed all the required packages).

include("src/ProxyingTSB.jl");

Reading the temporal graph

Assuming that the file workplace.txt is contained in the directory graphs included in the directory TSBProxy, the corresponding temporal network can be loaded by executing the following command (note that, in this case, the separator between the three elements of a temporal edge is the space character).

tg = load_temporal_graph("graphs/workplace.txt", " ");

We can obtain some basic statistics about the temporal network by executing the following command.

print_stats(tg, graph_name="Workplace network");

The result in this case should be the following output.

====================================================
Temporal network: Workplace network
====================================================
Number of nodes 92
Number temporal of edges 19654
Number of unique time stamps 7104
====================================================

Computing the temporal shortest betweenness

The values of the temporal shortest betweenness (in short TSB) of the graph tg can be computed by executing the following command.

tsb, t_tsb = temporal_shortest_betweenness(tg, 50, false);

The second parameter specifies after how many processed nodes a message has to be printed on the console (in order to verify the status of the computation). If this parameter is 0, then no ouptut is produced. The third parameter specifies whether the big integer implementation has to be used (in case there too many shortest paths). The execution of the above command should require less than a minute. The values returned are the array of the TSB values and the execution time.

The values of the temporal shortest betweenness and the execution time can be saved in the scores and the times directory, respectively, (included in the TSBProxy directory) as follows.

save_results("workplace", "tsb", tsb, t_tsb);

Computing the proxies

Similarly to the computation of the TSB values, we can compute and save the values of the proxies described in the paper, that is, the static betweenness (in short, BC), the prefix foremost betweenness (in short, PREFIX), the shortest ego-betweenness (in short, EGOTSB), the prefix foremost ego-betweenness (in short, EGOPREFIX), and the pass-through degree (in short, PTD).

bc, t = betweenness(tg, true);
save_results("workplace", "bc", bc, t);
prefix, t = temporal_prefix_foremost_betweenness(tg, 50);
save_results("workplace", "prefix", prefix, t);
egotsb, t = temporal_ego_betweenness_centrality(tg, t_tsb, 50);
save_results("workplace", "egotsb", egotsb, t);
egoprefix, t = temporal_ego_prefix_foremost_betweenness(tg, t_tsb, 50);
save_results("workplace", "egoprefix", egoprefix, t);
ptd, t = pass_through_degree(tg);
save_results("workplace", "ptd", ptd, t);

The second parameter of the function betweenness indicates whether feedback messages have to be printed (in order to verify the status of the computation), while the second parameter of the functions temporal_ego_betweenness_centrality and temporal_ego_prefix_foremost_betweenness denotes the maximum execution time, which can be usually set equal to the execution time for computing the TSB, and the third parameter specifies after how many processed nodes a message has to be printed on the console (in order to verify the status of the computation). In our example, the computation of the EGOTSB took more time than the computation of the TSB and, hence, no EGOTSB value file is created.

Computing the ONBRA approximation

In the case of ONBRA, we have to decide the size of the sample of node pairs to be used to apply the sampling approximation method. In the paper, we consider the following strategy. We first estimate the execution time avg_t of a BFS with input a pair of nodes, by computing the average execution time over a random sample of pairs od nodes (tipically of size between 100 and 500). We then divide the execution time for computing the TSB by avg_t: the resulting value max_ss is an estimate of the sample size which would cause the ONBRA algorithm executes as long as the computation of TSB.

max_ss = onbra_sample_size("workplace", 100, false, false);

The second parameter is the size of the sample to be used for computing the average time of a BFS, while the second and the third parameters specify whether the big integer data structure has to be used and the feedback messages have to be printed, respectively.

We then execute ONBRA for a sample sample size between 1 and max_ss. In the following code we use a sample size equal to the maximum one.

onbra("workplace", tg, max_ss, 50, false);

The centrality values for each pair in the sample are saved in one file within the directory scores and the statistics concerning the execution time are saved in another file within the directory times.

Analysing the quality of the proxies

We are now ready to evaluate the proxies in terms of their approximation of the TSB ranking. To this last aim, we use both the weighted Kendall tau correlation index between the proxie and the TSB rankings and the Jaccard index of the top k nodes in the proxie and the TSB rankings. For instance, in order to evaluate the static betweenness we can execute the following code.

tsb = read_centrality_values("scores/workplace/tsb.txt");
proxy = read_centrality_values("scores/workplace/bc.txt");
_, _, ktau = compute_correlations(tsb, proxy, false);
jproxy = jaccard(tsb, proxy, 75);
iproxy = intersection(tsb, proxy, 75);

(in the case of ONBRA, the second instruction has to be susbstituted by the following one

proxy = read_onbra_centrality_values("scores/workplace/onbra.txt", max_ss, length(tsb));

where max_ss has to be substituted by the used sample size). The weighted Kendal tau and the Jaccard index corresponding to the first 50 nodes can then be printed by executing the following instructions.

println(ktau[1], " ", jproxy[50], " ", iproxy[50]);

In this example, the Kendall tau is approximately equal to 0.89, the Jaccard index is approximately equal to 0.82, and the intersection of the top 50 nodes is equalt to 45. Note that, in the case of ONBRA, these values would be approximately 0.92, 0.79, and 44 (clearly, these values can change because of different samples of pairs of nodes).

A note on Kadabra

If you want to compare the KADABRA approximation of the static betweenness with the other proxies, you should first compute the KADABRA values (by using the implementation made available by the authors of KADABRA and by respecting the node indices) and save them in the kadabra.txt file in the scores directory.

About

Repository associated to paper "Proxying betweenness in Temporal Networks: A comparative Analysis"

License:GNU General Public License v3.0


Languages

Language:Julia 100.0%