Subject_No_i (SubjectNoi)

SubjectNoi

Geek Repo

Company:ReArch Group, Dept. of CSE, SJTU

Location:Shanghai

Home Page:http://subjectnoi.github.io/about/

Github PK Tool:Github PK Tool


Organizations
SJTU-ReArch-Group

Subject_No_i's starred repositories

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonLicense:AGPL-3.0Stargazers:142177Issues:1084Issues:7672

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:29515Issues:244Issues:5099

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:22112Issues:187Issues:501

mlx

MLX: An array framework for Apple silicon

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonLicense:NOASSERTIONStargazers:8582Issues:76Issues:546

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:8548Issues:95Issues:1915

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilogStargazers:7055Issues:68Issues:24

FasterTransformer

Transformer related optimization, including BERT, GPT

Language:C++License:Apache-2.0Stargazers:5857Issues:62Issues:625

hnswlib

Header-only C++/python library for fast approximate nearest neighbors

Language:C++License:Apache-2.0Stargazers:4348Issues:65Issues:372

chain-of-thought-hub

Benchmarking large language models' complex reasoning ability with chain-of-thought prompting

Language:Jupyter NotebookLicense:MITStargazers:2565Issues:38Issues:34

ispc

Intel® Implicit SPMD Program Compiler

Language:C++License:BSD-3-ClauseStargazers:2510Issues:94Issues:1270

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonLicense:MITStargazers:2486Issues:24Issues:177

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:1376Issues:16Issues:114

openqasm

Quantum assembly language for extended quantum circuits

Language:PythonLicense:Apache-2.0Stargazers:1229Issues:86Issues:230

unlimiformer

Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

Language:PythonLicense:MITStargazers:1052Issues:23Issues:60

CompilerGym

Reinforcement learning environments for compiler and program optimization tasks

Language:PythonLicense:MITStargazers:910Issues:34Issues:287

How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

Language:CudaLicense:Apache-2.0Stargazers:820Issues:13Issues:15

how-to-optimize-gemm

row-major matmul optimization

Language:C++License:GPL-3.0Stargazers:590Issues:16Issues:13

glake

GLake: optimizing GPU memory management and IO transmission.

Language:PythonLicense:Apache-2.0Stargazers:371Issues:7Issues:22

kokkos-tutorials

Tutorials for the Kokkos C++ Performance Portability Programming Ecosystem

Language:C++License:NOASSERTIONStargazers:293Issues:52Issues:42

cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Language:CudaLicense:MITStargazers:288Issues:4Issues:12

dnnweaver2

Open Source Specialized Computing Stack for Accelerating Deep Neural Networks.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:206Issues:16Issues:16

TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

Language:C++License:MITStargazers:143Issues:3Issues:59

dsa-framework

Release of stream-specialization software/hardware stack.

Language:PythonLicense:NOASSERTIONStargazers:116Issues:5Issues:12

bitfusion

Simulator for BitFusion

RayTracingToInfinity

A feature packed raytracer built with C++

Language:C++Stargazers:84Issues:3Issues:0

tvm_gpu_gemm

play gemm with tvm

brainstorm

Compiler for Dynamic Neural Networks

Language:C++License:Apache-2.0Stargazers:17Issues:4Issues:0
Language:PythonLicense:NOASSERTIONStargazers:14Issues:3Issues:0