Beast code in Giters

μ-Cuda, yet another painless cuda programming paradigm. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

Language:C++Apache-2.0000

MV2D

Code for "Object as Query: Lifting any 2D Object Detector to 3D Detection"

000

onnx-modifier

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

MIT000

pymlir

Python interface for MLIR - the Multi-Level Intermediate Representation

BSD-3-Clause000

resource-stream

CUDA related news and material links

MIT000

SHARK-Turbine

Unified compiler/runtime for interfacing with PyTorch Dynamo.

Apache-2.0000

SST

Codes for “Fully Sparse 3D Object Detection” & “Embracing Single Stride 3D Object Detector with Sparse Transformer”

Language:PythonApache-2.0000

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.

Apache-2.0000

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache-2.0000

torch-xla-SPMD

Pytorch/XLA SPMD Test code in Google TPU

MIT000

torchsparse

[MLSys'22] TorchSparse: Efficient Point Cloud Inference Engine

MIT000

zhouleidcc

zhouleidcc's repositories

CenterPoint_deploy

CenterPoint_train

cuda-beginner-course-cpp-version

cugraph

Cute-Gemm-Optimization

cutlass-b2bgemm

Cutlass_EX

cutlass_performance_profiling

CutlassProgramming_learning

google-research

gpu-toolkit

ImmortalTracker-for-CTRL

LKCompiler

llm.c

mlir-hello

mlir-tutorial

mlir-tutorial_cn

muda

MV2D

onnx-modifier

pymlir

resource-stream

rocMLIR

SHARK-Turbine

SimpleTrack

SST

TensorRT

TensorRT-LLM

torch-xla-SPMD

torchsparse