This is a PyTorch implementation of Zeroth-order training for MNIST dataset.
- Sparse training: Prune at initialization (GraSP https://arxiv.org/abs/2002.07376)
- Tensor-compressed models: TT-format and TTM-format
- Python >= 3.6
- PyTorch >= 1.8.0
- Tensorflow >= 2.5.0
- pyutils >= 0.0.1. See pyutils for installation.
- A tricky part: comment line 32 of ./setup.py when installing pyutils. tensorflow-gpu is not supported now
- NVIDIA GPUs and CUDA >= 10.2
- Others are listed in requirements.txt
For GraSP-pruned FC layers:
# FO-benchmark
python -u main_prune_MNIST.py -config configs/MNIST/FC/FO.yml
# ZO-gradient estimator
python main_prune_MNIST.py -config configs/MNIST/TTM/SGD.yml
# ZO-finite difference
python -u main_prune_MNIST.py -config configs/MNIST/FC/SCD_esti.yml
# ZO-coordinate descent
python -u main_prune_MNIST.py -config configs/MNIST/FC/SCD_batch.yml
For TTM layers:
# FO-benchmark
python -u main_prune_MNIST.py -config configs/MNIST/TTM/FO.yml
# ZO-gradient estimator
python main_prune_MNIST.py -config configs/MNIST/TTM/SGD.yml
# ZO-finite difference
python main_prune_MNIST.py -config configs/MNIST/TTM/SCD_esti.yml
# ZO-coordinate descent
python main_prune_MNIST.py -config configs/MNIST/TTM/SCD_batch.yml
- Select the provided experiments in ./run_tensors.sh
- run:
./run_tensors.sh
./optimizer/ZO_SGD_mask.py
Based on stochastic gradient estimator
- perturb all parameters with i.i.d. Gaussian perturbation
- evaluate the change of Loss function -> evaluate the directional direction of selected random direction
- get single-shot gradient estimation
- The expectation of gradient estimation is a bounded bias estimation of the true gradient
def __init__(
self,
model: nn.Module,
criterion: Callable,
masks,
lr: float = 0.01,
sigma: float = 0.1,
n_sample: int = 20,
signSGD: bool = False,
layer_by_layer: bool = False,
opt_layers_strs: list = []
):
Related Work:
./optimizer/ZO_SGD_mask.py
def __init__(
self,
model: nn.Module, #
criterion: Callable,
masks,
lr: float = 0.1,
grad_sparsity: float = 0.1,
h_smooth: float = 0.001,
grad_estimator: str = 'sign',
opt_layers_strs: list = [],
STP: bool = True,
momentum: float = 0,
weight_decay: float = 0,
dampening: float = 0,
adam: bool = False,
beta_1: float = 0.9,
beta_2: float = 0.98,
eps: float = 1e-06
):
grad_estimator: update rule
-
'sign': ZO-det Coordinate Descent, update the parameter one-by-one
-
'batch': ZO-det Coordinate Descent, update all parameters at the end of evaluation
-
'esti': ZO-finite difference, update all parameters at the end of evaluation
opt_layers_strs: layers that need to be trained. now supports:
- 'nn.Linear': nn.Linear,
- 'nn.Conv2d': nn.Conv2d,
- 'TensorizedLinear': TensorizedLinear,
- 'TensorizedLinear_module': TensorizedLinear_module,
- 'TensorizedLinear_module_tonn': TensorizedLinear_module_tonn
Related Work: