hgbzzw / gcforest

gcforest source code from Zhihua Zhou

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %Description: A python 2.7 implementation of gcForest proposed in [1]. % %A demo implementation of gcForest library as well as some demo client scripts to demostrate how to use the code. % %The implementation is flexible enough for modifying the model or fit your own datasets. % % % %Reference: [1] Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks. % % In IJCAI-2017. (https://arxiv.org/abs/1702.08835v2 ) % % % %Requirements: This package is developed with Python 2.7, please make sure all the dependencies are installed, % %which is specified in requirements.txt % % % %ATTN: This package is free for academic usage. % % You can run it at your own risk. % % For other purposes, please contact Prof. Zhi-Hua Zhou(zhouzh@lamda.nju.edu.cn) % % % %ATTN2: This package was developed by Mr.Ji Feng(fengj@lamda.nju.edu.cn). % % The readme file and demo roughly explains how to use the codes. % % For any problem concerning the codes, please feel free to contact Mr.Feng. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Package Official Website: http://lamda.nju.edu.cn/code_gcForest.ashx The official github repo is maintained at https://github.com/kingfengji/gcforest

This package is provided "AS IS" and free for academic usage. You can run it at your own risk. For other purposes, please contact Prof. Zhi-Hua Zhou (zhouzh@lamda.nju.edu.cn).

Before running the demo, make sure all the dependencies are installed, for instance, please run the following command to install dependencies before running the code: pip install -r requirements.txt This package is developed in python 2.7, higher versions of python is not suggested for the current package.

=================================== Outline for README

  • Package Overview
  • Notes on Demo Scripts
  • Notes on Model Specification Files
  • Example and Demos
  • Using Own Dataset

================================== Package Overview

  • lib/gcforest
    • code for the implementations for gcforest
  • tools/train_fg.py
    • the demo script used for training Fine grained Layers
  • tools/train_cascade.py
    • the demo script used for training Cascade Layers
  • models/
    • folder to save models which can be used in tools/train_fg.py and tools/train_cascade.py
    • the gcForest structure is saved in json format
  • logs
    • folder logs/gcforest is used to save the logfiles produced by demo scripts

============================ Notes on Demo Scripts

Below is a brief description on the args needed for demo scripts

%%%%%%%%%%%%%%%%%%%% tools/train_fg.py

  • --model: str
    • The config filepath for Fine grained models (in json format)
  • --save_outputs: bool
    • if True. The output predictions produced by Fine Grained Model will be saved in model_cache_dir which is specified in Model Config. This output will be used when Training Cascade Layer.
    • the default value is false

%%%%%%%%%%%%%%%%%%%%%% tools/train_cascade.py

  • --model: str
    • The model config filepath for cascade training (in json format)

%%%%%%%%%%%%%%%%%%%%%% Notes on Config Files %%%%%%%%%%%%%%%%%%%%%% Below is a brief introduction on how to use model specification files, namely

  • model specification for fine grained scanning structure.
  • model specification for cascade forests.

All the model specifications (in json files) are saved in models/ For instance, all the model specification files needed for MNIST is stored in models/mnist/gcforest

  • ca is short for cascade structure specifications
  • fg is short for fine-grained structure specifications

You can define your own structure by writing similar json files.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% FineGrained model's config (dataset) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • dataset.train, dataset.test: [dict]
    • coresponds to the particular datasets defined in lib/datasets
    • type [str]: see lib/datasets/init.py for a reference
    • You can use your own dataset by writing similar wrappers.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% FineGrained model's config (train) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • train.keep_model_in_mem: [bool] default=0
    • if 0, the forest will be freed in RAM
  • train.data_cache : [dict]
    • coresponds to the DataCache in lib/dataset/data_cache.py
  • train.data_cache.cache_dir (str)
    • make sure to change "/mnt/raid/fengji/gcforest/cifar10/fg-tree500-depth100-3folds/datas" to your own path

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% FineGrained model's config (net) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • net.outputs: [list]
    • List of the data names output by this model
  • net.layers: [List of Layers]
    • Layer's Config, see lib/gcforest/layers for a reference

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Cascade model's config (dataset) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Similar as FineGrained's model config (dataset)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Cascade model's config (cascade) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% see lib/gcforest/cascade/cascade_classifier.py init for a reference

============================= Examples and Demos

Before running the scripts, make sure to change

  • train.data_cache.cache_dir in the Finegrained Model Config (eg: model/xxx/fg-xxxx.json)
  • train.cascade.dataset.{train,test}.data_path in the Finegrained-Cascade Model Config (eg: model/xxx/fg-xxxx-ca.json)
  • train.cascade.cascade.data_save_dir in the Finegrained Model Config (eg: model/xxx/ca-xxxx.json and model/xxx/fg-xxxx-ca.json)

To Train a gcForest(with fine grained scanning), you need to run two scripts.

  • Fine Grained Scanning: 'tools/train_fg.py'
  • Cascade Training: 'tools/train_cascade.py'

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% UCI Letter %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • Get Data: you need to download the data by yourself by running the following command:
cd dataset/uci_letter
sh get_data.sh
  • Since we do not need to fine-grained scaning, we only train a Cascade Forest as follows:

    • python tools/train_cascade.py --model models/uci_letter/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/uci_letter/ca
  • Adult, YEAST can be trained with similar procedure.

%%%%%%%%%%%%%%%%%%%%% MNIST %%%%%%%%%%%%%%%%%%%%%

  • Get the data: The data will be automatically downloaded via 'lib/datasets/mnist.py', you do not need to do it yourself
  • First Train the Fine Grained Forest:
    • Run python tools/train_fg.py --model models/mnist/gcforest/fg-tree500-depth100-3folds.json --log_dir logs/gcforest/mnist/fg --save_outputs
    • This means:
    1. Train a fine grained model for MNIST dataset,
    2. Using the structure defined in models/mnist/gcforest/fg-tree500-depth100-3folds.json
    3. save the log files in logs/gcforest/mnist/fg
    4. The output for the fine grained scanning predictions is saved in train.data_cache.cache_dir
  • Then, train the cascade forest (Note: make sure you run the train_fg.py first)
    • run python tools/train_cascade.py --model models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json
    • This means:
    1. Train the fine grained scaning results with cascade structure.
    2. The cascade model specification is defined in 'models/mnist/gcforest/fg-tree500-depth100-3folds-ca.json'
  • You could also train a Cascade Forest without fine-grained scanning (but the accuracy will be much lower):
    • Run python tools/train_cascade.py --model models/mnist/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/mnist/ca

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% UCI sEMG %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • Get Data
cd dataset/uci_semg
sh get_data.sh
  • First Train the Fine Grained Forest:
    • python tools/train_fg.py --model models/uci_semg/gcforest/fg-tree500-depth100-3folds.json --save_outputs --log_dir logs/gcforest/uci_semg/fg
  • Then, train the cascade forest (Note: make sure you run the train_fg.py first)
    • python tools/train_cascade.py --model models/uci_semg/gcforest/fg-tree500-depth100-3folds-ca.json --log_dir logs/gcforest/uci_semg/gc
  • You could also training a Cascade Forest without fine-grained scanning(but the accuracy will be much lower):
    • python tools/train_cascade.py --model models/uci_semg/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/uci_semg/ca

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% GTZAN %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • Requirements(you need to install the following package) librosa

  • Get Data by yourself by running the following command

cd dataset/gtzan
sh get_data.sh
cd ../..
python tools/audio/cache_feature.py --dataset gtzan --feature mfcc --split genre.trainval
  • First Train the Fine Grained Forest:
    • python tools/train_fg.py --model models/gtzan/gcforest/fg-tree500-depth100-3folds.json --save_outputs --log_dir logs/gcforest/gtzan/fg
  • Then, train the cascade forest (Note: make sure you run the train_fg.py first)
    • python tools/train_cascade.py --model models/gtzan/gcforest/fg-tree500-depth100-3folds-ca.json --log_dir logs/gcforest/gtzan/gc
  • You could also training a Cascade Forest without fine-grained scanning(but the accuracy will be much lower):
    • python tools/train_cascade.py --model models/gtzan/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/gtzan/ca --save_outputs

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% IMDB %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • Cascade Forest:
    • python tools/train_cascade.py --model models/imdb/gcforest/ca-tree500-n4x2-3folds.json --log_dir logs/gcforest/imdb/ca

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% CIFAR10 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • First Train the Fine Grained Forest:
    • python tools/train_fg.py --model models/cifar10/gcforest/fg-tree500-depth100-3folds.json --save_outputs
  • Then, train the cascade forest (Note: make sure you run the train_fg.py first)
    • python tools/train_cascade.py --model models/cifar10/gcforest/fg-tree500-depth100-3folds-ca.json

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% For You Own Datasets %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  • Data Format: 0. Please refer lib/datasets/mnist.py as an example
    1. the dataset should has attribute X,y to represent the data and label
    2. y should be 1-d array
    3. For fine-grained scanning, X should be 4-d array (N x channel x H x W). (e.g. cifar10 shoud be Nx3x32x32, mnist should be Nx1x28x28, uci_semg should be Nx1x3000x1)
  • Model Specifications:
    1. Save the json file in models/$dataset_name (recommended)
    2. for a detailed description, see section 'Config Files'
  • If you only need to train a cascade forest, run tools/train_cascade.py.

Happy Hacking.

Reference: [1] Z.-H. Zhou and J. Feng. Deep Forest: Towards an Alternative to Deep Neural Networks. In IJCAI-2017.
(https://arxiv.org/abs/1702.08835v2 )


gcforest source code from Zhihua Zhou


Language:Python 96.0%Language:Shell 4.0%