Zhang Jun's repositories
zhangjun.github.io
https://zhangjun.github.io
stable_diffusion_compile
compile stable diffusion to run faster
TensorRT-Server
TensorRT Server
Paddle-Lite
Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎)
BaiduPCS-Go
iikira/BaiduPCS-Go原版基础上集成了分享链接/秒传链接转存功能
community
PaddlePaddle Developer Community
FastChat
An open platform for training, serving, and evaluating large languages. Release repo for Vicuna and FastChat-T5.
FasterTransformer
Transformer related optimization, including BERT, GPT
kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLM
Megatron-LM
Ongoing research training transformer models at scale
oneflow-diffusers
OneFlow backend for 🤗 Diffusers and ComfyUI
PaddleFleetX
Paddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC LocalSGD Wide&Deep
PaddleNLP
👑 Easy-to-use and powerful NLP library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis and 🖼 Diffusion AIGC system etc.
stable-diffusion-webui-docker
stable diffusion webui docker
stable-fast
An ultra lightweight inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
StableTriton
The first open source triton inference engine for Stable Diffusion, specifically for sdxl
Taipy-Chatbot-Demo
A template to create any LLM Inference Web Apps using Python only
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
transformer_framework
framework for plug and play of various transformers (vision and nlp) with FSDP
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs