whutbd's repositories

cuda-learn-note

🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaLicense:GPL-3.0Stargazers:2Issues:0Issues:0

Cpp-Templates-2ed

C++11/14/17/20 templates and generic programming, the most complex and difficult technical details of C++, indispensable in building infrastructure libraries.

Language:C++License:Apache-2.0Stargazers:1Issues:0Issues:0

byteps

A high performance and generic framework for distributed DNN training

License:NOASSERTIONStargazers:0Issues:0Issues:0

ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

License:Apache-2.0Stargazers:0Issues:0Issues:0

cmake-demo

《CMake入门实战》源码

Stargazers:0Issues:0Issues:0

CMakeTutorial

CMake中文实战教程

Language:C++License:MITStargazers:0Issues:0Issues:0

core

The core library and APIs implementing the Triton Inference Server.

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

CTranslate2

Fast inference engine for Transformer models

License:MITStargazers:0Issues:0Issues:0

FasterTransformer

Transformer related optimization, including BERT, GPT

License:Apache-2.0Stargazers:0Issues:0Issues:0

fastllm

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行

License:Apache-2.0Stargazers:0Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

License:Apache-2.0Stargazers:0Issues:0Issues:0

fun-rec

推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/

License:NOASSERTIONStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

graph-learn

An Industrial Graph Neural Network Framework

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

License:Apache-2.0Stargazers:0Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

License:MITStargazers:0Issues:0Issues:0

onnx-modifier

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

License:MITStargazers:0Issues:0Issues:0

onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

License:MITStargazers:0Issues:0Issues:0

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

pytorch-diffusion

pytorch复现stable diffusion

Stargazers:0Issues:0Issues:0

pytorch-transformer

pytorch复现transformer

Stargazers:0Issues:0Issues:0

PytorchOCR

基于Pytorch的OCR工具库,支持常用的文字检测和识别算法

Stargazers:0Issues:0Issues:0

rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

License:Apache-2.0Stargazers:0Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

License:NOASSERTIONStargazers:0Issues:0Issues:0

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

License:Apache-2.0Stargazers:0Issues:0Issues:0

SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques

License:UnlicenseStargazers:0Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

License:Apache-2.0Stargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

License:Apache-2.0Stargazers:0Issues:0Issues:0