cdancette / GMiner

GMiner: A fast GPU-based frequent itemset mining method for large-scale data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GMiner

Fork

This is a fork of GMiner with additional parameters. It is used in the project https://github.com/cdancette/detect-shortcuts

Introduction

GMiner is an algorithm for finding frequent itemsets using computing power of GPUs.

GMiner has the following characteristics:

  • Scalable with respect to the size of datasets, which do not fit into GPU device memory
  • Scalable with respect to the number of GPUs by evenly distributing amount of work to each GPU
  • Fast and robust comparing to the state-of-the art methods (especially, handling large-scale datasets)

Research Paper

GMiner: A fast GPU-based frequent itemset mining method for large-scale data

Implementation

Implemented C++ code interacting with GPU servers and computing frequent itemsets according to itemsets.

Installation

  • This version requires (1) g++ v.4.8 or greater be installed in the system and set in PATH, (2) Boost C++ Libraries be installed and set in PATH, and (3) CUDA 8 be installed and set in PATH (we are not testing GMiner on other configurations)
  • For compilation, type ./build.sh and then get “GMiner” in the same directory
  • For cleaning executable files, type ./clean.sh

Input File Format

In an input dataset, each transaction is stored in a single line (row). In the transaction, items are non-negative integers and separated by a space.

Output File Format

The output includes a number of lines. Each line includes a single frequent itemset and its support ratio in a range of [0,1].

How to run

Command

./GMiner -i <input_path> -o <output_path> -s <min_sup> -w <is_write_output>

Parameters

  • input_path (-i): path of the input file (in default “webdocs” in the directory).
  • output_path (-o): path of the output file (in default “out”)
  • min_sup (-s) : minimum support (in range [0.0,1.0], in default 0.1)
  • is_write_output (-w): whether it writes outputs or not (0: no, 1: yes, in default 0)

About

GMiner: A fast GPU-based frequent itemset mining method for large-scale data


Languages

Language:Cuda 67.1%Language:C++ 30.2%Language:Shell 2.7%