ZZK (MARD1NO)

MARD1NO

Geek Repo

Company:SiliconFlow

Location:Neverland

Home Page:https://mard1no.github.io/

Github PK Tool:Github PK Tool

ZZK's repositories

Language:C++Stargazers:1Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:1Issues:0Issues:0

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:1Issues:0Issues:0

cutlass_master

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

APPy

APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

auto-round

SOTA Weight-only Quantization Algorithm for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

cccl

CUDA C++ Core Libraries

Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

cudnn-frontend

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

Language:C++License:MITStargazers:0Issues:0Issues:0

EETQ

Easy and Efficient Quantization for Transformers

Language:C++Stargazers:0Issues:0Issues:0

float8_experimental

This repository contains the experimental PyTorch native float8 training UX

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

fp6_llm

An efficient GPU support for LLM inference with 6-bit quantization (FP6).

License:Apache-2.0Stargazers:0Issues:0Issues:0

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

GPUSorting

OneSweep, implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.

Language:HLSLLicense:NOASSERTIONStargazers:0Issues:0Issues:0

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language:PythonStargazers:0Issues:0Issues:0

lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

LLMRoofline

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Language:PythonStargazers:0Issues:0Issues:0
Stargazers:0Issues:1Issues:0
Language:C++License:NOASSERTIONStargazers:0Issues:0Issues:0

open-gpu-kernel-modules

NVIDIA Linux open GPU with P2P support

Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

qllm-eval

Code Repository of Evaluating Quantized Large Language Models

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

quanto

A pytorch Quantization Toolkit

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

QUICK

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

Language:SystemVerilogStargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:0Issues:0Issues:0

Triton-Puzzles

Puzzles for learning Triton

License:Apache-2.0Stargazers:0Issues:0Issues:0