ChenChengKuan / svrg_project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview

Group project for Columbia course COMS4995-004 Optimization for Machine Learning (webpage). The goal of the project is to implement and understand the optimization methods from the paper "Stochastic Variance Reduction for Nonconvex Optimization".

Scripts

svrg.py

Runs SVRG optimization using MLP against the MNIST dataset. Performs several epochs of SGD during warmup phase before SVRG updates. Optionally outputs metrics of the run, and plots the loss curve.

Example Usage:

python svrg.py --num_outer_epochs 10 --num_inner_epochs 2 --device cuda --metrics_path "runs/svrg_output.json" --plot

Group Members

  • Ryan Goldenberg
  • Liyi (Leo) Zhang
  • Chengkuan (Kuan) Chen

Experiment Setting

Reproducible work

MNIST SVRG

  • learning rate: To be tuned
  • num_warmup_epoch 10
  • num_outer_epoch 242 (solve from (sn + 2sm / n) = 290 with m = n/10)
  • inner_epoch_fraction 0.1 (from m = n / 10)
  • batch_size 10 MNIST SGD
  • learning rate: To be tuned
  • num_epoch 300
  • batch_size 10
  • l2-regularization 1e-3

CIFAR SVRG

  • learning rate: To be tuned
  • num_warmup_epoch: 10
  • num_outer_epoch: 408 (solve from (sn + 2sm / n) = 490 with m = n/10, we assume the rightmost value of x-axis is 500)
  • inner_epoch_fraction 0.1
  • batch_size: 10

CIFAR SGD

  • learning rate: To be tuned
  • num_epoch 500
  • batch_size 10
  • l2-regularization 1e-3

STL SVRG

  • learning rate: To be tuned
  • num_warmup_epoch: 20
  • num_outer_epoch 242 (solve from (sn + 2sm / n) = 290 with m = n/10)
  • inner_epoch_fraction 0.1 (from m = n / 10)
  • batch_size 10

STL SGD

  • learning rate: To be tuned
  • num_epoch 300
  • batch_size 10
  • l2-regularization 1e-4

Further evaluation work

Deeper Neural Net

  • Compare method: SGD/SVRG
  • dataset: MNIST
  • network: 3 layer MLP with hidden unit = 100 and relu as activate function

Rosenbrock function

  • Goal: Whether SVRG is better than SGD when the objective function has many saddle points
  • Compare mathod: SGD/SVRG
  • dataset: Rosenbrock data (size = 60000)

Eggholder function

  • Goal: Whether SVRG is better than SGD when the objective function has many local minima
  • Compare mathod: SGD/SVRG
  • dataset: Rosenbrock data (size = 60000)

References

  • Stochastic Variance Reduction for Nonconvex Optimization (arxiv)
  • Accelerating Stochastic Gradient Descent using Predictive Variance Reduction (arxiv)

About


Languages

Language:Python 100.0%