tulvgengenr / Warmup

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

2023 Fall 分布式机器学习lab1 Warmup

Todo List

  • CUDA编程实现LayerNorm定制化算子的前向传播
  • CUDA编程实现LayerNorm定制化算子的反向传播
  • 和pytorch.nn.LayerNorm正确性和性能对比
  • 使用pytorch profiler分析算子性能
  • 完成实验报告
  • [] 参考Triton进行优化

定制化LayerNorm算子

编译和安装

软件环境:pytorch2.1.0 + CUDA12.1

cd layerNorm_cuda_extension
python setup install --user

编译过程中,如遇到报错:

/usr/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type
>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/usr/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                        ^
/usr/include/pybind11/detail/../cast.h:45:120: error: expected identifier before ‘<’ token
/usr/include/pybind11/detail/../cast.h:45:123: error: expected primary-expression before ‘>’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                           ^
/usr/include/pybind11/detail/../cast.h:45:126: error: expected primary-expression before ‘)’ token
   45 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();

解决方案:在/usr/include/pybind11/cast.h中进行以下修改:

-    return caster.operator typename make_caster<T>::template cast_op_type<T>();
+    return caster;

正确性和性能测试

cd ..
python custom_layerNorm.py

如果没有assert则说明在误差范围内正确,并打印前向和反向的运行时间比较。

执行

python custom_layerNorm.py --profile

将打印pytorch profile的结果,并将完整结果以json格式存储在profile文件夹中,可以通过chrome浏览器chrome://tracing/进行可视化。

About


Languages

Language:Python 66.5%Language:Cuda 23.1%Language:C++ 10.4%