huangwei021230

Wei Huang's starred repositories

InternLM

Official release of InternLM2.5 base and chat models. 1M context support

Language:PythonApache-2.0621700

arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.

Language:Jupyter NotebookApache-2.041300

TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications

Language:PythonMIT35500

FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

Language:PythonNOASSERTION20300

storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

Language:PythonMIT1011500

RAG-Retrieval

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT,Cross Encoder

Language:PythonMIT42400

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++Apache-2.0576900

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonApache-2.0372200

qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Language:PythonApache-2.039400

calm

CUDA/Metal accelerated language model inference

Language:CMIT36300

dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.

Language:PythonApache-2.01335100

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonBSD-2-Clause287000

EQ-Bench

A benchmark for emotional intelligence in large language models

Language:PythonMIT17500

OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Language:PythonMIT66800

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0814600