%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %Description: A python 2.7 implementation of gcForest proposed in [1]. % %A demo implementation of gcForest library as well as some demo client scripts to demostrate how to use the code. % %The implementation is flexible enough for modifying the model or fit your own datasets. % % % %Reference: [1] Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks. % % In IJCAI-2017. (https://arxiv.org/abs/1702.08835v2 ) % % % %Requirements: This package is developed with Python 2.7, please make sure all the dependencies are installed, % %which is specified in requirements.txt % % % %ATTN: This package is free for academic usage. % % You can run it at your own risk. % % For other purposes, please contact Prof. Zhi-Hua Zhou(zhouzh@lamda.nju.edu.cn) % % % %ATTN2: This package was developed by Mr.Ji Feng(fengj@lamda.nju.edu.cn). % % The readme file and demo roughly explains how to use the codes. % % For any problem concerning the codes, please feel free to contact Mr.Feng. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Package Official Website: http://lamda.nju.edu.cn/code_gcForest.ashx The official github repo is maintained at https://github.com/kingfengji/gcforest

This package is provided "AS IS" and free for academic usage. You can run it at your own risk. For other purposes, please contact Prof. Zhi-Hua Zhou (zhouzh@lamda.nju.edu.cn).

Before running the demo, make sure all the dependencies are installed, for instance, please run the following command to install dependencies before running the code: pip install -r requirements.txt This package is developed in python 2.7, higher versions of python is not suggested for the current package.

=================================== Outline for README

Package Overview
Notes on Demo Scripts
Notes on Model Specification Files
Example and Demos
Using Own Dataset

================================== Package Overview

lib/gcforest
- code for the implementations for gcforest
tools/train_fg.py
- the demo script used for training Fine grained Layers
tools/train_cascade.py
- the demo script used for training Cascade Layers
models/
- folder to save models which can be used in tools/train_fg.py and tools/train_cascade.py
- the gcForest structure is saved in json format
logs
- folder logs/gcforest is used to save the logfiles produced by demo scripts

============================ Notes on Demo Scripts

Below is a brief description on the args needed for demo scripts

%%%%%%%%%%%%%%%%%%%% tools/train_fg.py
%%%%%%%%%%%%%%%%%%%%

--model: str
- The config filepath for Fine grained models (in json format)
--save_outputs: bool
- if True. The output predictions produced by Fine Grained Model will be saved in model_cache_dir which is specified in Model Config. This output will be used when Training Cascade Layer.
- the default value is false

%%%%%%%%%%%%%%%%%%%%%% tools/train_cascade.py
%%%%%%%%%%%%%%%%%%%%%%

--model: str
- The model config filepath for cascade training (in json format)

%%%%%%%%%%%%%%%%%%%%%% Notes on Config Files %%%%%%%%%%%%%%%%%%%%%% Below is a brief introduction on how to use model specification files, namely

model specification for fine grained scanning structure.
model specification for cascade forests.

All the model specifications (in json files) are saved in models/ For instance, all the model specification files needed for MNIST is stored in models/mnist/gcforest

ca is short for cascade structure specifications
fg is short for fine-grained structure specifications

You can define your own structure by writing similar json files.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% FineGrained model's config (dataset) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

dataset.train, dataset.test: [dict]
- coresponds to the particular datasets defined in lib/datasets
- type [str]: see lib/datasets/init.py for a reference
- You can use your own dataset by writing similar wrappers.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% FineGrained model's config (train) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

train.keep_model_in_mem: [bool] default=0
- if 0, the forest will be freed in RAM
train.data_cache : [dict]
- coresponds to the DataCache in lib/dataset/data_cache.py
train.data_cache.cache_dir (str)
- make sure to change "/mnt/raid/fengji/gcforest/cifar10/fg-tree500-depth100-3folds/datas" to your own path

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% FineGrained model's config (net) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

net.outputs: [list]
- List of the data names output by this model
net.layers: [List of Layers]
- Layer's Config, see lib/gcforest/layers for a reference

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Cascade model's config (dataset) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Similar as FineGrained's model config (dataset)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Cascade model's config (cascade) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% see lib/gcforest/cascade/cascade_classifier.py init for a reference

============================= Examples and Demos

Before running the scripts, make sure to change

train.data_cache.cache_dir in the Finegrained Model Config (eg: model/xxx/fg-xxxx.json)
train.cascade.dataset.{train,test}.data_path in the Finegrained-Cascade Model Config (eg: model/xxx/fg-xxxx-ca.json)
train.cascade.cascade.data_save_dir in the Finegrained Model Config (eg: model/xxx/ca-xxxx.json and model/xxx/fg-xxxx-ca.json)

To Train a gcForest(with fine grained scanning), you need to run two scripts.

Fine Grained Scanning: 'tools/train_fg.py'
Cascade Training: 'tools/train_cascade.py'

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% UCI Letter %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Get Data: you need to download the data by yourself by running the following command:

cd dataset/uci_letter
sh get_data.sh

Since we do not need to fine-grained scaning, we only train a Cascade Forest as follows:
- python tools/train_cascade.py --model models/uci_letter/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/uci_letter/ca
Adult, YEAST can be trained with similar procedure.

%%%%%%%%%%%%%%%%%%%%% MNIST %%%%%%%%%%%%%%%%%%%%%

Get the data: The data will be automatically downloaded via 'lib/datasets/mnist.py', you do not need to do it yourself
First Train the Fine Grained Forest:
- Run python tools/train_fg.py --model models/mnist/gcforest/fg-tree500-depth100-3folds.json --log_dir logs/gcforest/mnist/fg --save_outputs
- This means:
1. Train a fine grained model for MNIST dataset,
2. Using the structure defined in models/mnist/gcforest/fg-tree500-depth100-3folds.json
3. save the log files in logs/gcforest/mnist/fg
4. The output for the fine grained scanning predictions is saved in train.data_cache.cache_dir
Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- run python tools/train_cascade.py --model models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json
- This means:
1. Train the fine grained scaning results with cascade structure.
2. The cascade model specification is defined in 'models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json'
You could also train a Cascade Forest without fine-grained scanning (but the accuracy will be much lower):
- Run python tools/train_cascade.py --model models/mnist/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/mnist/ca

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% UCI sEMG %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Get Data

cd dataset/uci_semg
sh get_data.sh

First Train the Fine Grained Forest:
- python tools/train_fg.py --model models/uci_semg/gcforest/fg-tree500-depth100-3folds.json --save_outputs --log_dir logs/gcforest/uci_semg/fg
Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- python tools/train_cascade.py --model models/uci_semg/gcforest/fg-tree500-depth100-3folds-ca.json --log_dir logs/gcforest/uci_semg/gc
You could also training a Cascade Forest without fine-grained scanning(but the accuracy will be much lower):
- python tools/train_cascade.py --model models/uci_semg/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/uci_semg/ca

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% GTZAN %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Requirements(you need to install the following package) librosa
Get Data by yourself by running the following command

cd dataset/gtzan
sh get_data.sh
cd ../..
python tools/audio/cache_feature.py --dataset gtzan --feature mfcc --split genre.trainval

First Train the Fine Grained Forest:
- python tools/train_fg.py --model models/gtzan/gcforest/fg-tree500-depth100-3folds.json --save_outputs --log_dir logs/gcforest/gtzan/fg
Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- python tools/train_cascade.py --model models/gtzan/gcforest/fg-tree500-depth100-3folds-ca.json --log_dir logs/gcforest/gtzan/gc
You could also training a Cascade Forest without fine-grained scanning(but the accuracy will be much lower):
- python tools/train_cascade.py --model models/gtzan/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/gtzan/ca --save_outputs

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% IMDB %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Cascade Forest:
- python tools/train_cascade.py --model models/imdb/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/imdb/ca

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% CIFAR10 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

First Train the Fine Grained Forest:
- python tools/train_fg.py --model models/cifar10/gcforest/fg-tree500-depth100-3folds.json --save_outputs
Then, train the cascade forest (Note: make sure you run the train_fg.py first)
- python tools/train_cascade.py --model models/cifar10/gcforest/fg-tree500-depth100-3folds-ca.json

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% For You Own Datasets %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Data Format: 0. Please refer lib/datasets/mnist.py as an example
1. the dataset should has attribute X,y to represent the data and label
2. y should be 1-d array
3. For fine-grained scanning, X should be 4-d array (N x channel x H x W). (e.g. cifar10 shoud be Nx3x32x32, mnist should be Nx1x28x28, uci_semg should be Nx1x3000x1)
Model Specifications:
1. Save the json file in models/$dataset_name (recommended)
2. for a detailed description, see section 'Config Files'
If you only need to train a cascade forest, run tools/train_cascade.py.

Happy Hacking.

Reference: [1] Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks. In IJCAI-2017.
(https://arxiv.org/abs/1702.08835v2 )

About

gcforest source code from Zhihua Zhou

Languages

Language:Python 96.0%Language:Shell 4.0%