memoiry / data-analysis

data analysis functions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

math and data analysis functions

shingling

  • k-shingles generation
  • minhashing

jaccard similarity

  • jaccard similarity calculation
  • jaccard distance calculation
  • jaccard conditional comparaison

adwords problem

  • greedy_adwords
  • balance_adwords
  • generalized_balance_adwords

frequency problem

  • items frequency
  • the algorithm of savasere, omniescinski and navathe

graph problem

  • graph construction
  • shortest_path
  • longest path
  • centrality
  • independent graphs detection
  • clustering_coef
  • dijkstra
  • dijkstra with heap

recommendation problem

  • hamming distance
  • euclidean distance
  • pearson correlation
  • tanimoto score
  • euclidean similarity
  • pearson similarity
  • tanimoto similarity
  • top similars
  • top similar with map reduce
  • recommendation user filtred
  • recommendation item filtred

Radix tree

  • insert
  • remove
  • search
  • longest prefix

Decision tree

  • Divide data
  • Gini impurity
  • Entropy
  • Variance
  • Buil tree
  • Prune
  • Classify
  • Draw tree

Page Rank

A very simple version/implementation of the page rank algorithm.

  • Page rank
  • Advanced version of page rank, topic sensitive
  • spam farms
  • spam farms
  • trust rank
  • Hiperlink induced topic search
  • Map reduce to efficiently calculates the page rank
  • Jaccard simiarity to be found in data analysis repo

Map-Reduce

Implementation of map reduce, and some examples.

  • Map Reduce class
  • Estimation of pi number
  • Calculation of frequency of Items from multiple files

About

data analysis functions

License:BSD 2-Clause "Simplified" License


Languages

Language:Python 97.2%Language:C 2.8%