Tiantian Han's repositories
ColossalAI
Making large AI models cheaper, faster and more accessible
AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
bitsandbytes
8-bit CUDA functions for PyTorch
deepsparse
Sparsity-aware deep learning inference runtime for CPUs
Efficient-LLMs-Survey
Efficient Large Language Models: A Survey
float8_experimental
This repository contains the experimental PyTorch native float8 training UX
fp6_llm
An efficient GPU support for LLM inference with 6-bit quantization (FP6).
ggml
Tensor library for machine learning
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
LLM-FP4
The official implementation of the EMNLP 2023 paper LLM-FP4
llm_interview_note
大模型面试题及答案,大模型八股文
lm-evaluation-harness
A framework for few-shot evaluation of language models.
LSQuantization
The PyTorch implementation of Learned Step size Quantization (LSQ) in ICLR2020 (unofficial)
Megatron-LM
Ongoing research training transformer models at scale
microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
ml_dtypes
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
onnx2torch
Convert ONNX models to PyTorch.
peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
qa-lora
Official PyTorch implementation of QA-LoRA
QAQ-KVCacheQuantization
QAQ: Quality Adaptive Quantization for LLM KV Cache
QuaRot
Code for QuaRot, an end-to-end 4-bit inference of large language models.
serverchan-demo
Server酱多语言调用实例
tiny-asic-4bit-matrix-mul
Tiny matrix multiplication ASIC with 4-bit math
UltraEval
An open source framework for evaluating foundation models.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
what-is
Important concepts in numerical linear algebra and related areas