Beast code in Giters

whutbd's repositories

Cpp-Templates-2ed

C++11/14/17/20 templates and generic programming, the most complex and difficult technical details of C++, indispensable in building infrastructure libraries.

Language:C++Apache-2.0100

apollo

An open autonomous driving platform

Language:C++Apache-2.0000

byteps

A high performance and generic framework for distributed DNN training

Language:PythonNOASSERTION000

ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

Language:C++Apache-2.0000

Cpp-Concurrency-in-Action-2ed

C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.

Language:C++Apache-2.0000

CppTemplateTutorial

中文的C++ Template的教学指南。与知名书籍C++ Templates不同，该系列教程将C++ Templates作为一门图灵完备的语言来讲授，以求帮助读者对Meta-Programming融会贯通。(正在施工中)

000

CUDA-Programming

Sample codes for my CUDA programming book

GPL-3.0000

DeepCTR-Torch

【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.

Language:PythonApache-2.0000

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Apache-2.0000

fastText

Library for fast text representation and classification.

Language:HTMLMIT000

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.0000

fun-rec

推荐系统入门教程，在线阅读地址：https://datawhalechina.github.io/fun-rec/

NOASSERTION000

graph-learn

An Industrial Graph Neural Network Framework

Language:C++Apache-2.0000

nann

A flexible, high-performance framework for large-scale retrieval problems based on TensorFlow.

Apache-2.0000

oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

Apache-2.0000

onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

MIT000

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonApache-2.0000

PaddleRec

Recommendation Algorithm大规模推荐算法库，包含推荐系统经典及最新算法LR、Wide&Deep、DSSM、TDM、MIND、Word2Vec、Bert4Rec、DeepWalk、SSR、AITM，DSIN，SIGN，IPREC、GRU4Rec、Youtube_dnn、NCF、GNN、FM、FFM、DeepFM、DCN、DIN、DIEN、DLRM、MMOE、PLE、ESMM、ESCMM, MAML、xDeepFM、DeepFEFM、NFM、AFM、RALM、DMR、GateNet、NAML、DIFM、Deep Crossing、PNN、BST、AutoInt、FGCNN、FLEN、Fibinet、ListWise、DeepRec、ENSFM，TiSAS，AutoFIS等，

Apache-2.0000

ppl.kernel.cpu

Language:C++000

ppl.llm.kernel.cuda

Language:C++Apache-2.0000

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:CNOASSERTION000

serving

A flexible, high-performance serving system for machine learning models

Apache-2.0000

Serving-1

A flexible, high-performance carrier for machine learning models（『飞桨』服务化部署框架）

Apache-2.0000

simdtutor

x86-64 SIMD矢量优化系列教程

Language:Python000

SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques

Unlicense000

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

whutbd

whutbd's repositories

Cpp-Templates-2ed

apollo

byteps

ByteTransformer

Cpp-Concurrency-in-Action-2ed

CppTemplateTutorial

CUDA-Programming

CUDALibrarySamples

DeepCTR-Torch

DeepSpeed

fastText

flashinfer

fun-rec

graph-learn

nann

occl

oneflow

OneshotAllreduceExample

onnxruntime

PaddleOCR

PaddleRec

ppl.kernel.cpu

ppl.llm.kernel.cuda

seamless_communication

serving

Serving-1

simdtutor

SimpleGPUHashTable

TensorRT-LLM

vllm