alexngng

followers

following

stars

alibaba cloud

Beijing

Alex Ng's repositories

CUDA-Learn-Note

🎉CUDA 笔记 / 高频面试题汇总 / C++笔记，个人笔记，更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaGPL-3.0600

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Language:PythonApache-2.0000

ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Language:PythonNOASSERTION000

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++Apache-2.0000

llama

Inference code for LLaMA models

Language:PythonGPL-3.0000

tensor_parallel

Automatically split your PyTorch models on multiple GPUs for training & inference

Language:PythonMIT000

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0000