HoangViet144 / DDP

DDP (Distributed data parallel)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Distributed data parallel for Neural network training

This is a basic repository for demonstration on how to take advantage of multi-machines for distributed data parallel (DDP) training on a Neural network.

The implementation is based on Pytorch, specificially the module torch.nn.parallel.DistributedDataParallel:

"This container parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension. The module is replicated on each machine and each device, and each such replica handles a portion of the input. During the backwards pass, gradients from each node are averaged."

Frontend

First of all, you need to install Node.js which can be found at https://nodejs.org/en/

Second, in the terminal, change dir to frontend folder and run command:

npm install

to install all dependencies, and then run

npm start

Backend

Regarding of backend, you need to install mongo database, which can be found at https://www.mongodb.com/try/download/community

Then, in the terminal, change dir to backend folder and run command:

npm install

to install all dependencies, and then run

npm start

to start the server. Moreover, you can see the database via the GUI MongoDBCompass included in the downloaded mongodb package

Model

Please read Readme in model directory for more information

Project status and Roadmap

TODO list:

  • Create a GUI to choose which model will be used
  • Make it work on an network

Author

Current author:

References

About

DDP (Distributed data parallel)


Languages

Language:JavaScript 54.2%Language:Python 30.2%Language:HTML 8.8%Language:CSS 5.3%Language:Pug 1.4%