liujuncheng

Juncheng's repositories

nccl

Optimized primitives for collective multi-GPU communication

Language:CudaNOASSERTION1 20

test

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Language:PythonApache-2.0010

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION010

EnergonAI

Large-scale model inference.

Language:PythonApache-2.0010

FastFold

Optimizing Protein Structure Prediction Model Training and Inference on GPU Clusters

Language:CudaApache-2.0010

onnx

Open standard for machine learning interoperability

Language:C++Apache-2.0010

openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2

Language:PythonApache-2.0010

Uni-Core

an efficient distributed PyTorch framework

Language:PythonMIT010