whutbd's repositories

Cpp-Templates-2ed

C++11/14/17/20 templates and generic programming, the most complex and difficult technical details of C++, indispensable in building infrastructure libraries.

Language:C++License:Apache-2.0Stargazers:1Issues:0Issues:0

apollo

An open autonomous driving platform

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

byteps

A high performance and generic framework for distributed DNN training

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

Cpp-Concurrency-in-Action-2ed

C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

CppTemplateTutorial

中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)

Stargazers:0Issues:0Issues:0

CUDA-Programming

Sample codes for my CUDA programming book

License:GPL-3.0Stargazers:0Issues:0Issues:0

CUDALibrarySamples

CUDA Library Samples

License:NOASSERTIONStargazers:0Issues:0Issues:0

DeepCTR-Torch

【PyTorch】Easy-to-use,Modular and Extendible package of deep-learning based CTR models.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

License:Apache-2.0Stargazers:0Issues:0Issues:0

fastText

Library for fast text representation and classification.

Language:HTMLLicense:MITStargazers:0Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

fun-rec

推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/

License:NOASSERTIONStargazers:0Issues:0Issues:0

graph-learn

An Industrial Graph Neural Network Framework

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

nann

A flexible, high-performance framework for large-scale retrieval problems based on TensorFlow.

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:NOASSERTIONStargazers:0Issues:0Issues:0

oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

License:MITStargazers:0Issues:0Issues:0

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

PaddleRec

Recommendation Algorithm大规模推荐算法库,包含推荐系统经典及最新算法LR、Wide&Deep、DSSM、TDM、MIND、Word2Vec、Bert4Rec、DeepWalk、SSR、AITM,DSIN,SIGN,IPREC、GRU4Rec、Youtube_dnn、NCF、GNN、FM、FFM、DeepFM、DCN、DIN、DIEN、DLRM、MMOE、PLE、ESMM、ESCMM, MAML、xDeepFM、DeepFEFM、NFM、AFM、RALM、DMR、GateNet、NAML、DIFM、Deep Crossing、PNN、BST、AutoInt、FGCNN、FLEN、Fibinet、ListWise、DeepRec、ENSFM,TiSAS,AutoFIS等,

License:Apache-2.0Stargazers:0Issues:0Issues:0
Language:C++Stargazers:0Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0

serving

A flexible, high-performance serving system for machine learning models

License:Apache-2.0Stargazers:0Issues:0Issues:0

Serving-1

A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)

License:Apache-2.0Stargazers:0Issues:0Issues:0

simdtutor

x86-64 SIMD矢量优化系列教程

Language:PythonStargazers:0Issues:0Issues:0

SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques

License:UnlicenseStargazers:0Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0