hustzxd

followers

following

stars

AMD

Beijing

https://joyeeo.github.io/about

zhaoxiandong's starred repositories

sparse_gpu_operator

GPU operators for sparse tensor operations

Language:Python2000

LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Language:PythonApache-2.02281200

EfficientPaperList

Paper about Pruning, Quantization, and Efficient-inference/training.

Language:Python300

my-tv

我的电视电视直播软件，安装即可使用

Language:CApache-2.02671000

FLAP

[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models

Language:PythonApache-2.02600

PiPPy

Pipeline Parallelism for PyTorch

Language:PythonBSD-3-Clause63700

neurips_llm_efficiency_challenge

NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day

Language:Python23800

DeepSpeedExamples

Example models using DeepSpeed

Language:PythonApache-2.0574300

LLM-Finetuning

LLM Finetuning with peft

Language:Jupyter Notebook166900

litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

Language:PythonApache-2.0688300

gpu_poor

Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

Language:JavaScript65700

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0688400

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Language:PythonApache-2.0201300

composer

Supercharge Your Model Training

Language:PythonApache-2.0502900

SparseFinetuning

Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry

Language:PythonApache-2.03600

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonApache-2.0809000

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++Apache-2.0552900

ChatGPT-Academic-Prompt

Use ChatGPT for academic writing

MIT39700

chatgpt-prompts-for-academic-writing

This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.

Decentralized_FM_alpha

Language:Python1900

DejaVu

Language:Python23100

triton

Development repository for the Triton language and compiler

Language:C++MIT1143700

Llama-Chinese

Llama中文社区，Llama3在线体验和微调模型已开放，实时汇总最新Llama3学习资料，已将所有代码更新适配Llama3，构建最好的中文Llama大模型，完全开源可商用

Language:Python1220900

GPU-Puzzles

Solve puzzles. Learn CUDA.

Language:Jupyter NotebookMIT513200

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Language:Python85200

pdftitle

a utility to extract the title from a PDF file

Language:PythonGPL-3.012900

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Language:Python909200

PaperListTemplate

This template makes it easy for you to manage papers.

Language:Python200

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonMIT192400

wanda

A simple and effective LLM pruning approach.

Language:PythonMIT53800