whutbd's repositories
Cpp-Templates-2ed
C++11/14/17/20 templates and generic programming, the most complex and difficult technical details of C++, indispensable in building infrastructure libraries.
apollo
An open autonomous driving platform
byteps
A high performance and generic framework for distributed DNN training
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
Cpp-Concurrency-in-Action-2ed
C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.
CppTemplateTutorial
中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)
CUDA-Programming
Sample codes for my CUDA programming book
CUDALibrarySamples
CUDA Library Samples
DeepCTR-Torch
【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
fastText
Library for fast text representation and classification.
flashinfer
FlashInfer: Kernel Library for LLM Serving
fun-rec
推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/
graph-learn
An Industrial Graph Neural Network Framework
nann
A flexible, high-performance framework for large-scale retrieval problems based on TensorFlow.
oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
PaddleRec
Recommendation Algorithm大规模推荐算法库,包含推荐系统经典及最新算法LR、Wide&Deep、DSSM、TDM、MIND、Word2Vec、Bert4Rec、DeepWalk、SSR、AITM,DSIN,SIGN,IPREC、GRU4Rec、Youtube_dnn、NCF、GNN、FM、FFM、DeepFM、DCN、DIN、DIEN、DLRM、MMOE、PLE、ESMM、ESCMM, MAML、xDeepFM、DeepFEFM、NFM、AFM、RALM、DMR、GateNet、NAML、DIFM、Deep Crossing、PNN、BST、AutoInt、FGCNN、FLEN、Fibinet、ListWise、DeepRec、ENSFM,TiSAS,AutoFIS等,
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
serving
A flexible, high-performance serving system for machine learning models
Serving-1
A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)
simdtutor
x86-64 SIMD矢量优化系列教程
SimpleGPUHashTable
A simple GPU hash table implemented in CUDA using lock free techniques
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs