Authors: Kyle Gilman, Nathan Louis, Alex Ritchie
Instructions - How to run code:
Datasets:
- MNIST : Download directly through PyTorch
- CIFAR : Download directly through PyTorch
- CT Dataset: https://archive.ics.uci.edu/ml/datasets/Relative+location+of+CT+slices+on+axial+axis
Files to run:
ct_test.py
: Use default settings to run a multi-layered feed-forward network on the CT dataset (regression). The optimization is performed using GN, GN Half-sketch, GN Sketch, SGD, and ADAM. Optional to run on GPU with 'cuda' parametermnist_cnn_test.py
: Use default settings to run a convolution + 2 fc layer network on MNIST for 10 digit classification. The optimization is performed using GN, GN Half-sketch, GN Sketch, SGD, and ADAM. Recommended to run on CPU only, 12 GB GPU runs out of memorysketched_GN_networks.py
: Runs the mnist binary regression experiment using the python implementation in autograd (not PyTorch). Use default settings to run a 2 fc layer network on MNIST for 2 digit (0, 1) classification.
Other important files:
GN_solver.py
contains the python implementation of the Gauss-Newton Sketch solver
Requirements
- Python3
- PyTorch v1.0.1
- Torchvision v0.2.1
- HIPS Autograd v1.2
Paper Sources
Sketching
- Iterative Hessian Sketch: Fast and Accurate SolutionApproximation for Constrained Least-Squares M. Pilanci, M. J. Wainwright, JMLR 2016
- Information-TheoreticMethods in Data Science: Information-theoretic bounds on sketching M. Pilanci, Stanford
Second-Order optimization methods
- Training Feedforward Networks with the Marquardt Algorithm M. T. Hagan, M. B. Menhaj, IEEE Transactions on Neural Networks 1994
- Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent N. N. Schraudolph, Neural Computation 2002
- Practical Gauss-Newton Optimisation for Deep Learning A. Botev, H. Ritter, D. Barber, ICML 2017
- Deep learning via Hessian-free optimization J. Martens, ICML 2010
- First-and second-order methods for learning: between steepest descent and Newton's method. R. Battiti, Neural Computation 1992