GromitC / Rank-Correlation-Sketches

Sketching Algorithms For Approximating Kendall's Tau Rank Correlation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sketching Algorithm for Kendall Tau's Rank Correlation

Kendall’s Tau is a measure of rank correlation between two list of rank vectors. This metric's time complexity is O(n log(n)), which can be slow when used in pairwise comparison tasks like clustering. There is an approximated way to calculate it in constant time, which is described in the work "Sketching Algorithms For Approximating Rank Correlations In Collaborative Filtering Systems". I also include a write up in the repo to simplify the descriptions.

Usage

Just clone the package and put sketch.py to your working directory.

from sketch import KTSketch

x1 = [1,3,2,4,5]
x2 = [3,2,1,5,4]

epsilon = 0.05     	#accuracy as the abs. error <= epsilon
CI = 0.95          	#confidence interval as P(abs. error <= epsilon) >= CI
dimension = 5 		#size of vector

ktsketch = KTSketch(epsilon=epsilon,CI=CI,dim=dimension,seed=0)
ktsketch.correlation(x1,x2)

About

Sketching Algorithms For Approximating Kendall's Tau Rank Correlation

License:MIT License


Languages

Language:Jupyter Notebook 94.0%Language:Python 6.0%