ScalaGBM

Overview

ScalaGBM is an efficient GPU-based GBDT system, which can handle high-dimensional and large-scale dataset and train fast. It is optimized based on ThunderGBM.

Prerequisites

cmake 2.8 or above
gcc 11.x for Linux
CUDA 11.0+

Introduction

Download

git clone https://github.com/BoruiXu/ScalaGBM.git

Build on Linux. Before building, it is necessary to set the architecture of the GPU on line 28 (-arch) in CMakeLists.txt. For example, when using Nvidia RTX A6000, -arch=compute_86.

cd ScalaGNM
mkdir build
cd build
cmake ..
make -j

Usage example

./bin/scalagbm-train data=dataset/datasetname objective=binary:logistic tree_method=hist n_trees=40 depth=6

Datasets

All test datasts can be downloaded through the script in dataset floader.

sh ./dataset/get_datasets.sh

Parameter and Test

The meaning of parameters is the same as that in ThunderGBM. At present, only histogram-based training method is supported. We provide a bash script (train_test.sh) to train datasets mentioned in our paper. Befor running this script, please copy this script into the build floder. If you want to test the real-sim dataset. Please run:

sh train_test.sh real-sim

NOTED: all datasets need to be stored in the dataset folder!

About

This repo implements an efficient GPU-based GBDT system. It can handle high-dimensional data on GPU and train fast.

Languages

Language:C++ 78.2%Language:Cuda 21.1%Language:CMake 0.4%Language:Shell 0.3%