Antonio-Cruciani / MHSEDataAnalysisTool

Python tool used to analyze the mhse algorithm results

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

README

This is a Python script designed to analyze MinHash Signature Estimation Algorithm (MHSE).

Table of Contents

Description

This script allows you to analyze and fully reproduce our MHSE experiments.

MHSE

Short-Intro

MHSE is an algorithm to efficiently estimate the effective diameter and other distance metrics on very large graphs that are based on the neighborhood function such as the exact diameter, the (effective) radius or the average distance (more details) . Currently, we have published two version of the algorithm: the original one (MHSE), and the space efficient one (SE-MHSE) that, produces the same outcomes of MHSE but with less space complexity. SE-MHSE allows you to run this algorithm on machines with limited memory and also to easily parallelize it using any map-reduce framework. You can find our algorithm at the following link .

Algorithm-output

The algorithm outputs the following JSON:

{
  "collisionsTable" :
  "minHashNodeIDs"  :
  "numSeeds" :
  "numNodes"  :
  "numArcs"  :
  "seedsTime" :  
  "lastHops" :
  "time" :
  "lowerBoundDiameter" :  
  "totalCouples" :
  "totalCouplePercentage" : 
  "avgDistance" : 
  "effectiveDiameter" : 
  "algorithmName" : 
  "maxMemoryUsed" : 
  "seedsList" : 
  "threshold" : 
  "direction" : 
  "hopTable" : 
}

If you execute the algorithm more than once, it will output a list of JSON:

[{
  "collisionsTable" :
  "minHashNodeIDs"  :
  .
  . 
  .
  "direction" : 
  "hopTable" : 
},{
  "collisionsTable" :
  "minHashNodeIDs"  :
  .
  . 
  .
  "direction" : 
  "hopTable" : 
},
  .
  . 
  .
]

Analysis

USEFUL TIP

You can set the same output file for all your executions of the MHSE and\or SEMHSE obtaining a list of JSON of all the exectutions (or you can also create the list of JSON as a second step after all the experiments). Given the JSON the script will automatically detect all the different parameters and group them all to calculate the statistics.

About

Python tool used to analyze the mhse algorithm results

License:MIT License


Languages

Language:Python 100.0%