hinofafa / torch_accelerator

Experiments to accelerate GPU device for PyTorch training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

torch_accelerator

Experiments to boost GPU device to PyTorch training

Todo

  1. Automatic Mixed Precision FP16
  2. Activate Tensor Core
  3. Distribution of params on Tensorboard

Reference

  1. Tensor Core https://www.nvidia.com/en-us/data-center/tensor-cores/

  2. Turing Structure https://www.nvidia.com/en-us/design-visualization/technologies/turing-architecture/

  3. Tips for optimization Blog https://developer.nvidia.com/blog/optimizing-gpu-performance-tensor-cores/

  4. Memory limited layer https://docs.nvidia.com/deeplearning/performance/dl-performance-memory-limited/index.html

  5. Automatic Mixed Precision for Deep Learning https://developer.nvidia.com/automatic-mixed-precision

  6. Mixed Precision Training https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html

  7. Introducing Native Pytorch amp for faster training on nvidia gpus https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/

  8. Training Neural Networks with Tensor Cores PPT https://nvlabs.github.io/eccv2020-mixed-precision-tutorial/files/dusan_stosic-training-neural-networks-with-tensor-cores.pdf

  9. Nvidia Developer Blog https://developer.nvidia.com/deep-learning

  10. Nvidia Deep Learning Examples https://developer.nvidia.com/deep-learning-examples

  11. Using Nsight Compute or Nvprof to show mixed precision use in deep learning https://developer.nvidia.com/blog/using-nsight-compute-nvprof-mixed-precision-deep-learning-models/

  12. Cuda Pro Tip: nvprof is your handy universal gpu profiler https://developer.nvidia.com/blog/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/

  13. PyTorch tutorial Automatic Mixed Precision https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html

About

Experiments to accelerate GPU device for PyTorch training

License:MIT License


Languages

Language:Jupyter Notebook 100.0%