jsw-zorro

The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)

Language:PythonMIT42700

DistServe

Disaggregated serving system for Large Language Models (LLMs).

Language:Jupyter NotebookApache-2.021100

torchtitan

A native PyTorch Library for large model training

Language:PythonBSD-3-Clause137700

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Language:Python24100

qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Language:PythonApache-2.036000

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookMIT1160100

guidance

A guidance language for controlling large language models.

Language:Jupyter NotebookMIT1833900

S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Language:PythonApache-2.0165400

prometheus-eval

Evaluate your LLM's response with Prometheus and GPT4 💯

Language:PythonApache-2.070600

attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Language:PythonMIT42500

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION2496600

torchtune

A Native-PyTorch Library for LLM Fine-tuning

Language:PythonBSD-3-Clause369300

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT389400

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT2239300

Primo

Primo: Practical Learning-Augmented Systems with Interpretable Models

Language:JavaScriptApache-2.01700

outlines

Structured Text Generation

Language:PythonApache-2.0741400

SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.

Language:PythonMIT1215500

scattermoe

Triton-based implementation of Sparse Mixture of Experts.

Language:PythonApache-2.015200

LLM-Blender

[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the diverse strengths of multiple open-source LLMs. LLM-Blender cut the weaknesses through ranking and integrate the strengths through fusing generation to enhance the capability of LLMs.

Language:PythonApache-2.083600

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Language:Python100400

jsw-zorro

Mikasa's starred repositories

CitationMap

sglang

ServerlessLLM

how-to-learn-deep-learning-framework

pytorch-cppcuda-tutorial

lectures

vidur

sarathi-serve

calculate-flops.pytorch