MARD1NO

followers

following

stars

SiliconFlow

Neverland

https://mard1no.github.io/

ZZK's repositories

OneshotAllreduceExample

Language:Cuda1 10

open-resume

OpenResume is a powerful open-source resume builder and resume parser. https://open-resume.com/

Language:TypeScriptAGPL-3.0100

tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Language:CudaMIT100

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.0000

Awesome-LLM-System-Papers

000

ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

Language:C++Apache-2.0000

CUDALibrarySamples

CUDA Library Samples

Language:CudaNOASSERTION000

CV-CUDA

CV-CUDA™ is an open-source, graphics processing unit (GPU)-accelerated library for cloud-scale image processing and computer vision.

Language:C++Apache-2.0000

docs

Documentations for PaddlePaddle

Language:PythonApache-2.0000

dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.

Language:C++MIT000

EdgeGPT

Reverse engineered API of Microsoft's Bing Chat AI

Language:PythonUnlicense000

FlexGen

Running large language models like OPT-175B/GPT-3 on a single GPU. Up to 100x faster than other offloading systems.

Language:PythonApache-2.0000

Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

Language:C++NOASSERTION000

GPTQ-triton

GPTQ inference Triton kernel

Language:Jupyter NotebookApache-2.0000

InferLLM

a lightweight LLM model inference framework

Language:C++Apache-2.0000

INT8-Flash-Attention-FMHA-Quantization

Language:Python000

kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Language:Jupyter NotebookApache-2.0000

LLMsPracticalGuide

000

LLMSurvey

A collection of papers and resources related to Large Language Models.

000

matxscript

The model pre-processing and post-processing framework

Language:C++Apache-2.0000

nanoPyC

Language:Python000

nccl-tests

NCCL Tests

Language:CudaBSD-3-Clause000

ppl.kernel.cuda

Language:C000

PTX-ISA

CUDA PTX-ISA Document 中文翻译版

Apache-2.0000

QuickMathHPP

a single-header math library

Language:C++MIT000

RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Language:PythonApache-2.0000

taichi-nerfs

Implementations of NeRF variants based on Taichi + PyTorch

Language:PythonApache-2.0000

tiktoken

Language:PythonMIT000

typst

A new markup-based typesetting system that is powerful and easy to learn.

Language:RustApache-2.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000