zhiqi-0 / RDMA-MXNet-ps-lite

RDMA Optimization on MXNet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RDMA optimization on MXNet/ps-lite

  • This optimization is based on MXNet 0.10
  • This optimization is done on Linux Centos Sugon cluster(10 nodes).
  • This optimization requires infiniband card & lmlx4 library

Attention

  • ps-lite-rdma-final/ source code is different from report/final-submit/ source code. But both of them can run smmothly.
  • The differences between these two source code files are:
    • ps-lite-rdma-final is completely written by Lin Zhiqi (the owner of this repository), the final-submit source code is written by Song Xiaoniu.
    • The major difference between this two source code is the basic model of RDMA QP and CQ
      • ps-lite-rdma-final uses 1 shared send cq (not srq!) on all QPs. Each QP has its own recv cq.
      • final-submit use the RDMA model that each QP has its own send cq and recv cq
      • Features ps-lite-rdma-final has but final-sbumit doesn't have:
        • Parallel memcpy (by unlocking early locks of rdma send operation)
        • multi-post-recv-request(repeatly post multi recv request at end of connection setup, thus can provide higher performance when facing with n workers - 1 server)
  • These two codes have similar performance. But due to final-submit has more sample tests results, so we finally use this version to submmit the final-report.

USTC 2017 RDMA Team

About

RDMA Optimization on MXNet


Languages

Language:C++ 32.9%Language:Cuda 22.0%Language:Python 21.6%Language:Jupyter Notebook 7.8%Language:Scala 5.4%Language:Perl 4.0%Language:R 1.8%Language:Makefile 1.1%Language:CMake 1.0%Language:Shell 0.8%Language:C 0.6%Language:Java 0.5%Language:MATLAB 0.2%Language:Groovy 0.1%Language:Batchfile 0.1%Language:Perl 6 0.0%