karegren / shared-memory-sgd

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

shared-memory-sgd

C++ framework for implementing shared-memory parallel SGD for Deep Neural Network training

Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Framework for implementing parallel shared-memory Artificial Neural Network (ANN) training in C++ with SGD, supporting various synchronization mechanisms degrees of consistency. The code builds upon the MiniDNN implementation, and relies on Eigen and OpenMP. In particular, the project includes the implementation of LEASHED which guarantees consistency and lock-freedom.

For technical details of LEASHED please see the original paper:

Bäckström, K., Walulya, I., Papatriantafilou, M., & Tsigas, P. (2021, February). Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence. In Proceedings of the 35th IEEE International Parallel & Distributed Processing Symposium (to appear). Full version.

The following shared-memory parallel SGD methods are implemented:

  • Lock-based consistent asynchronous SGD
  • LEASHED - Lock-free implementation of consistent asynchronous SGD
  • Hogwild! - Lock-free asynchronous SGD without consistency
  • Synchronous parallel SGD

Getting Started

To get a local copy up and running follow these steps.

Installation

  1. Clone the repo
    git clone https://github.com/dcs-chalmers/shared-memory-sgd.git
  2. Build project
    bash build.sh
  3. Compile
    bash compile.sh

Usage

Input

Arguments and options - reference list:

Flag Meaning Values
a algorithm ['ASYNC', 'HOG', 'LSH', 'SYNC']
n n.o. threads Integer
A architecture ['MLP', 'CNN']
L n.o. hidden layers Integer (applies for MLP only)
U n.o. hidden neurons per layer Integer (applies for MLP only)
B persistence bound Integer (applies for LEASHED only)
e n.o. epochs Integer
r n.o. rounds per epochs Integer
b mini-batch size Integer
l Step size Float

to see all options:

./cmake-build/debug/mininn --help

Output

Output is a JSON object containing the following data:

Field Meaning
epoch_loss list of loss values corresponding to each epoch
epoch_time wall-clock time measure upon completing corresponding epoch
staleness_dist distribution of staleness
numtriesdist distribution of n.o. CAS attempts (applies to LSH only)

Examples

Multi-layer perceptron (MLP) training for 5 epochs batch size 512 and step size 0.005 with 8 threads using LEASHED-SGD:

./cmake-build-debug/mininn -a LSH -n 8 -A MLP -L 3 -U 128 -e 5 -r 469 -b 512 -l 0.005

Multi-layer perceptron (MLP) training with 8 threads using Hogwild!:

./cmake-build-debug/mininn -a HOG -n 8 -A MLP -L 3 -U 128 -e 5 -r 469 -b 512 -l 0.005

Convolutional neural network (CNN) training with 8 threads using LEASHED-SGD:

./cmake-build-debug/mininn -a LSH -n 8 -A CNN -e 5 -r 469 -b 512 -l 0.005

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the AGPL-3.0 License. See LICENSE for more information.

Contact

Karl Bäckström - bakarl@chalmers.se

Project Link: https://github.com/dcs-chalmers/shared-memory-sgd

Acknowledgements

A big thanks to the Wallenberg AI, Autonomous Systems and Software Program (WASP) for funding this work.

About

License:GNU Affero General Public License v3.0


Languages

Language:C++ 98.2%Language:Cuda 1.4%Language:C 0.2%Language:CMake 0.2%Language:Shell 0.0%Language:Objective-C 0.0%