zhaoxiandong's starred repositories
sparse_gpu_operator
GPU operators for sparse tensor operations
LLaMA-Factory
Unify Efficient Fine-Tuning of 100+ LLMs
EfficientPaperList
Paper about Pruning, Quantization, and Efficient-inference/training.
neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
DeepSpeedExamples
Example models using DeepSpeed
LLM-Finetuning
LLM Finetuning with peft
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
text-generation-inference
Large Language Model Text Generation Inference
FasterTransformer
Transformer related optimization, including BERT, GPT
ChatGPT-Academic-Prompt
Use ChatGPT for academic writing
chatgpt-prompts-for-academic-writing
This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.
Llama-Chinese
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
GPU-Puzzles
Solve puzzles. Learn CUDA.
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
PaperListTemplate
This template makes it easy for you to manage papers.