SubjectNoi

Subject_No_i's starred repositories

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonAGPL-3.0142177 1084 7672

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.029515 244 5099

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.022112 187 501

mlx

MLX: An array framework for Apple silicon

Language:C++MIT16989 146 547

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonNOASSERTION8582 76 546

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.08548 95 1915

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilog7055 68 24

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++Apache-2.05857 62 625

hnswlib

Header-only C++/python library for fast approximate nearest neighbors

Language:C++Apache-2.04348 65 372

chain-of-thought-hub

Benchmarking large language models' complex reasoning ability with chain-of-thought prompting

Language:Jupyter NotebookMIT2565 38 34

ispc

Intel® Implicit SPMD Program Compiler

Language:C++BSD-3-Clause2510 94 1270

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonMIT2486 24 177

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.01376 16 114

openqasm

Quantum assembly language for extended quantum circuits

Language:PythonApache-2.01229 86 230

unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

Language:PythonMIT1052 23 60

CompilerGym

Reinforcement learning environments for compiler and program optimization tasks

Language:PythonMIT910 34 287

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

Language:CudaApache-2.0820 13 15