Weili17

followers

following

stars

Weili17's starred repositories

ray-llm

RayLLM - LLMs on Ray

Language:PythonApache-2.0120800

gpustack

Manage GPU clusters for running LLMs

Language:PythonApache-2.016400

giantpandacv.com

www.giantpandacv.com

Language:PythonNOASSERTION14700

ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Language:PythonApache-2.03246900

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.01115000

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonMIT152300

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT418900

smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Language:PythonMIT113100

meetups

6900

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Language:PythonApache-2.0378100

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.0169900

llama-cpp-python

Python bindings for llama.cpp

Language:PythonMIT735000

llama.cpp

llama 2 Inference

Language:CMIT3100

ollama

Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.

Language:GoMIT8309700

step_into_llm

MindSpore online courses: Step into LLM

Language:Jupyter NotebookApache-2.038900

CUDA_Programming

《CUDA编程基础与实践》一书的代码

Language:Cuda7300

CUDA-Programming

Sample codes for my CUDA programming book

Language:CudaGPL-3.0148000

KuiperLLama

动手实现大模型推理框架

Language:C++10100

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Language:CNOASSERTION588700

lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Language:C++NOASSERTION314500

Learn-CUDA-Programming

Learn CUDA Programming, published by Packt

Language:CudaMIT96000

lectures

Material for cuda-mode lectures

Language:Jupyter NotebookApache-2.0202300

baby-llama2-chinese_cybertron

使用单个24G显卡，从0开始训练LLM

Language:PythonMIT4300

llama.cpp

LLM inference in C/C++

Language:C++MIT6293200

DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Language:PythonApache-2.0179600

batch-prompting

[EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.

Language:Python6300

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

Apache-2.029100

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Language:Python24400

SwiftSage

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Language:Python23200

TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Language:C++NOASSERTION145900