vaasugambhir / mining-large-datasets

Python implementation of the Apriori, PCY, Multistage and Multihash algorithms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mining of massive datasets

A python implementation of the Apriori, PCY, Multistage and Multihash algorithms

To run a particular algorithm, cd into that directory and run 'python index.py'. index.py has a collection of all passes for all the algorithms and prints the result of each pass (i.e., item index table, the frequent k sets, etc.). For the given sample dataset, we do not require more than 3 passes and hence we stop after checking for candidate tripletons

Reference: Mining of massive datasets by Anand Rajaraman and Jeffrey D. Ullman

About

Python implementation of the Apriori, PCY, Multistage and Multihash algorithms


Languages

Language:Python 100.0%