Sheng Qin's starred repositories

Language:PythonStargazers:37Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:850Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:4459Issues:0Issues:0

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:21112Issues:0Issues:0

latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:Jupyter NotebookLicense:MITStargazers:11266Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5832Issues:0Issues:0

edm

Elucidating the Design Space of Diffusion-Based Generative Models (EDM)

Language:PythonLicense:NOASSERTIONStargazers:1237Issues:0Issues:0

msccl

Microsoft Collective Communication Library

Language:C++License:NOASSERTIONStargazers:287Issues:0Issues:0

DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

License:MITStargazers:3214Issues:0Issues:0

awesome-rdma

A curated list of awesome rdma resources

Stargazers:1Issues:0Issues:0

MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

Language:C++License:Apache-2.0Stargazers:235Issues:0Issues:0
Language:Jupyter NotebookStargazers:3Issues:0Issues:0

LLM-FP4

The official implementation of the EMNLP 2023 paper LLM-FP4

Language:PythonLicense:MITStargazers:149Issues:0Issues:0

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonLicense:BSD-3-ClauseStargazers:8234Issues:0Issues:0

Tensile

Stretching GPU performance for GEMMs and tensor contractions.

Language:PythonLicense:MITStargazers:205Issues:0Issues:0

out-of-the-box-fp8-training

Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.

Language:Jupyter NotebookLicense:MITStargazers:32Issues:0Issues:0

veScale

A PyTorch Native LLM Training Framework

Language:PythonLicense:Apache-2.0Stargazers:524Issues:0Issues:0

Model-References

Reference models for Intel(R) Gaudi(R) AI Accelerator

Language:PythonStargazers:147Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:12188Issues:0Issues:0

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:25304Issues:0Issues:0

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonLicense:MITStargazers:494Issues:0Issues:0

NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

Language:CLicense:Apache-2.0Stargazers:267Issues:0Issues:0

microxcaling

PyTorch emulation library for Microscaling (MX)-compatible data formats

Language:PythonLicense:MITStargazers:130Issues:0Issues:0

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1710Issues:0Issues:0

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Language:CudaStargazers:1307Issues:0Issues:0

mini-rv32ima

A tiny C header-only risc-v emulator.

Language:CLicense:MITStargazers:1591Issues:0Issues:0

cpufp

A CPU tool for benchmarking the peak of floating points

Language:AssemblyLicense:GPL-3.0Stargazers:457Issues:0Issues:0
Language:C++Stargazers:467Issues:0Issues:0

WebGL-Fluid-Simulation

Play with fluids in your browser (works even on mobile)

Language:JavaScriptLicense:MITStargazers:14484Issues:0Issues:0