Xuweijia-buaa's repositories
alphaFM
Multi-thread implementation of Factorization Machines with FTRL for binary-class classification problem.
cutlass
CUDA Templates for Linear Algebra Subroutines
deep-learning-framework-needle
torch-like, can train cnn,lstm network etc.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
dgl
Python package built to ease deep learning on graph, on top of existing DL frameworks.
flash-attention
Fast and memory-efficient exact attention
How_to_optimize_in_GPU-qiqizi
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
KB2E
Knowledge Graph Embeddings including TransE, TransH, TransR and PTransE
LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. It is under the umbrella of the DMTK(http://github.com/microsoft/dmtk) project of Microsoft.
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
PaddleRec
Recommendation Algorithm大规模推荐算法库,包含推荐系统经典及最新算法LR、Wide&Deep、DSSM、TDM、MIND、Word2Vec、Bert4Rec、DeepWalk、SSR、AITM,DSIN,SIGN,IPREC、GRU4Rec、Youtube_dnn、NCF、GNN、FM、FFM、DeepFM、DCN、DIN、DIEN、DLRM、MMOE、PLE、ESMM、ESCMM, MAML、xDeepFM、DeepFEFM、NFM、AFM、RALM、DMR、GateNet、NAML、DIFM、Deep Crossing、PNN、BST、AutoInt、FGCNN、FLEN、Fibinet、ListWise、DeepRec、ENSFM,TiSAS,AutoFIS等,
parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch-examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
pytorch-extension-cpp
C++ extensions in PyTorch
sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable.
tensorRT-learn
tensorRT-learn start-from-trt-comprtition
torchrec
Pytorch domain library for recommendation systems
trt-samples-for-hackathon-cn
Simple samples for TensorRT programming
VisCPM
Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow