leeyeehoo

Yuhong Li's starred repositories

text-generation-inference

Large Language Model Text Generation Inference

Language:PythonApache-2.0805200

InfLLM

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

Language:PythonMIT21200

JetMoE

Reaching LLaMA2 Performance with 0.1M Dollars

Language:PythonApache-2.092900

yarn

YaRN: Efficient Context Window Extension of Large Language Models

Language:PythonMIT117900

BitDelta

Language:Jupyter NotebookApache-2.016400

LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Language:Jupyter NotebookNOASSERTION110700

Long-Context-Data-Engineering

Implementation of paper Data Engineering for Scaling Language Models to 128K Context

Language:Python35800

LEval

[ACL'24] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark

Language:PythonGPL-3.028200

fstattention

Memory bandwidth efficient sparse tree attention

Language:Python200

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language:Python20500

axlearn

An Extensible Deep Learning Library

Language:PythonApache-2.092600

EasyKV

Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

Language:Python4900

LongLM

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Language:PythonMIT49900

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.066700

jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Language:PythonApache-2.02820900

LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.

Language:SwiftMIT95200

search_with_lepton

Building a quick conversation-based search demo with Lepton AI.

Language:TypeScriptApache-2.0710500

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0680400

leeyeehoo

Yuhong Li's starred repositories

text-generation-inference

SnapKV

InfLLM

JetMoE

yarn

BitDelta

LLMTest_NeedleInAHaystack

Long-Context-Data-Engineering

LEval

fstattention

KVQuant

axlearn

EasyKV

LongLM

flashinfer

jax

LLMFarm

search_with_lepton

prompt-lookup-decoding

TensorRT-LLM

alphageometry

ELF

magicoder

chex

mlx-examples

cleanrl

chatgpt-retrieval

PPO-PyTorch

mamba-minimal

lm-evaluation-harness