ziyu huang's starred repositories
Benchmark_SpGEMM_using_CSR
CSR-based SpGEMM on nVidia and AMD GPUs
RoIAlign.pytorch
RoIAlign & crop_and_resize for PyTorch
BERT-pytorch
Google AI 2018 BERT pytorch implementation
YHs_Sample
Yinghan's Code Sample
IT5007_Project_Spark-Tok
This is the repository containing souce code of our IT5007 Project - Spark Tok
acrotensor
A C++ library for computing large scale tensor contractions.
how-to-optimize-gemm
row-major matmul optimization
code-samples
Source code examples from the Parallel Forall Blog
FasterTransformer
Transformer related optimization, including BERT, GPT
MSplitGEMM
Large matrix multiplication in CUDA
SGEMM-Implementation-and-Optimization
:pencil: Some source code about matrix multiplication implementation on CUDA
matrix-cuda
matrix multiplication in CUDA
optimizing-matrix-multiplication-examples
Here's optimizing matrix multiplication examples.
NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
CUDA-Programming-with-Python
关于书籍CUDA Programming使用了pycuda模块的Python版本的示例代码
CUDA-Programming
Sample codes for my CUDA programming book
extension-cpp
C++ extensions in PyTorch
pytorch-extension
an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors
cuda_accelerate
使用c++以及cuda加速神经网络样例(实现矩阵加法和矩阵乘法)
tensorly-notebooks
Tensor methods in Python with TensorLy