olokevin / GraSP_ZO

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Zeroth-order Training for Lottery-pruned Models & Tensor-compressed Models

This is a PyTorch implementation of Zeroth-order training for MNIST dataset.

Requirements:

  • Python >= 3.6
  • PyTorch >= 1.8.0
  • Tensorflow >= 2.5.0
  • pyutils >= 0.0.1. See pyutils for installation.
    • A tricky part: comment line 32 of ./setup.py when installing pyutils. tensorflow-gpu is not supported now
  • NVIDIA GPUs and CUDA >= 10.2
  • Others are listed in requirements.txt

Usage:

MNIST

For GraSP-pruned FC layers:

# FO-benchmark
python -u main_prune_MNIST.py -config configs/MNIST/FC/FO.yml

# ZO-gradient estimator
python main_prune_MNIST.py -config configs/MNIST/TTM/SGD.yml

# ZO-finite difference
python -u main_prune_MNIST.py -config configs/MNIST/FC/SCD_esti.yml
# ZO-coordinate descent
python -u main_prune_MNIST.py -config configs/MNIST/FC/SCD_batch.yml

For TTM layers:

# FO-benchmark
python -u main_prune_MNIST.py -config configs/MNIST/TTM/FO.yml

# ZO-gradient estimator
python main_prune_MNIST.py -config configs/MNIST/TTM/SGD.yml

# ZO-finite difference
python main_prune_MNIST.py -config configs/MNIST/TTM/SCD_esti.yml
# ZO-coordinate descent
python main_prune_MNIST.py -config configs/MNIST/TTM/SCD_batch.yml

2-layer Encoder:

  • Select the provided experiments in ./run_tensors.sh
  • run:
./run_tensors.sh

Zeroth-order Optimizer

ZO_SGD_mask:

./optimizer/ZO_SGD_mask.py

Based on stochastic gradient estimator

  • perturb all parameters with i.i.d. Gaussian perturbation
  • evaluate the change of Loss function -> evaluate the directional direction of selected random direction
  • get single-shot gradient estimation
  • The expectation of gradient estimation is a bounded bias estimation of the true gradient

image-20230504114916795

def __init__(
        self,
        model: nn.Module,
        criterion: Callable,
        masks,
        lr: float = 0.01,
        sigma: float = 0.1,
        n_sample: int = 20,
        signSGD: bool = False,
        layer_by_layer: bool = False,
        opt_layers_strs: list = []
    ):

Related Work:

FLOPS: EFficient On-Chip Learning for OPtical Neural Networks Through Stochastic Zeroth-Order Optimization | IEEE Conference Publication | IEEE Xplore

ZO_SCD_mask

./optimizer/ZO_SGD_mask.py

def __init__(
        self,
        model: nn.Module,	# 
        criterion: Callable,
        masks,
        lr: float = 0.1,
        grad_sparsity: float = 0.1,
        h_smooth: float = 0.001,
        grad_estimator: str = 'sign',
        opt_layers_strs: list = [],
        STP: bool = True,
        momentum: float = 0,
        weight_decay: float = 0,
        dampening: float = 0,
        adam: bool = False,
        beta_1: float = 0.9,
        beta_2: float = 0.98,
        eps: float = 1e-06
    ):

image-20230504115526548

image-20230504115642767

grad_estimator: update rule

  • 'sign': ZO-det Coordinate Descent, update the parameter one-by-one

  • 'batch': ZO-det Coordinate Descent, update all parameters at the end of evaluation

  • 'esti': ZO-finite difference, update all parameters at the end of evaluation

opt_layers_strs: layers that need to be trained. now supports:

  • 'nn.Linear': nn.Linear,
  • 'nn.Conv2d': nn.Conv2d,
  • 'TensorizedLinear': TensorizedLinear,
  • 'TensorizedLinear_module': TensorizedLinear_module,
  • 'TensorizedLinear_module_tonn': TensorizedLinear_module_tonn

Related Work:

https://ojs.aaai.org/index.php/AAAI/article/view/16928

About

License:MIT License


Languages

Language:Python 66.3%Language:Jupyter Notebook 33.2%Language:Shell 0.5%