Haicheng Wu (hwu36)

hwu36

Geek Repo

Company:@nvidia

Github PK Tool:Github PK Tool

Haicheng Wu's starred repositories

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonLicense:AGPL-3.0Stargazers:138940Issues:1071Issues:7621

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonLicense:NOASSERTIONStargazers:81777Issues:1742Issues:44542

stablediffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Language:PythonLicense:MITStargazers:38257Issues:444Issues:302

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:34563Issues:343Issues:2700

Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

Language:C++License:Apache-2.0Stargazers:22032Issues:716Issues:18247

onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonLicense:NOASSERTIONStargazers:8295Issues:76Issues:513

oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

Language:C++License:Apache-2.0Stargazers:5851Issues:145Issues:966

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:5256Issues:104Issues:1021

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Language:PythonLicense:Apache-2.0Stargazers:4520Issues:82Issues:242

spack

A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Language:PythonLicense:NOASSERTIONStargazers:4203Issues:99Issues:8408

jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Language:PythonLicense:Apache-2.0Stargazers:3060Issues:63Issues:346

xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

Language:C++License:Apache-2.0Stargazers:2545Issues:40Issues:318

FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

Language:C++License:Apache-2.0Stargazers:1634Issues:33Issues:639

voltaML-fast-stable-diffusion

Beautiful and Easy to use Stable Diffusion WebUI

Language:PythonLicense:GPL-3.0Stargazers:965Issues:24Issues:78

nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

Language:C++License:MITStargazers:947Issues:43Issues:206

BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

Language:C++License:Apache-2.0Stargazers:788Issues:36Issues:234
Language:PythonLicense:Apache-2.0Stargazers:765Issues:12Issues:34

hidet

An open-source efficient deep learning framework/compiler, written in python.

Language:PythonLicense:Apache-2.0Stargazers:644Issues:17Issues:84

x-stable-diffusion

Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:552Issues:14Issues:21

byteir

A model compilation solution for various hardware

Language:MLIRLicense:Apache-2.0Stargazers:355Issues:11Issues:16
Language:C++License:Apache-2.0Stargazers:141Issues:14Issues:18

gpuocelot

GPUOcelot: A dynamic compilation framework for PTX

Language:C++License:BSD-3-ClauseStargazers:132Issues:4Issues:15
Language:PythonLicense:MITStargazers:127Issues:10Issues:10
Language:HTMLLicense:MITStargazers:72Issues:4Issues:9

Cutlass_EX

study of cutlass

Language:CudaLicense:MITStargazers:18Issues:1Issues:1

SPARTA

SParse AcceleRation on Tensor Architecture

SeRe

Code for project "NeRF4SeRe: Neural Radiance Fields for Scene Reconstruction". (2022-2023-1) AI3604@SJTU: Computer Vision.

Language:PythonLicense:MITStargazers:9Issues:3Issues:0