zhouleidcc's repositories

CenterPoint_deploy

Export CenterPoint PonintPillars ONNX Model For TensorRT

License:MITStargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:0Issues:0

cuda-beginner-course-cpp-version

bilibili视频【CUDA 12.1 并行编程入门(C++语言版)】配套代码

License:MITStargazers:0Issues:0Issues:0

cugraph

cuGraph - RAPIDS Graph Analytics Library

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

cutlass-b2bgemm

an extension to the cutlass half-precision b2b gemm example

Stargazers:0Issues:0Issues:0

Cutlass_EX

study of cutlass

License:MITStargazers:0Issues:0Issues:0

cutlass_performance_profiling

Exploration of GEMM Performance Improvement with CUTLASS

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

google-research

Google Research

License:Apache-2.0Stargazers:0Issues:0Issues:0

gpu-toolkit

🦚 🧰 Collection of basic GPU algorithms implemented in CUDA C++.

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

LKCompiler

small a compiler

License:NOASSERTIONStargazers:0Issues:0Issues:0

llm.c

LLM training in simple, raw C/CUDA

License:MITStargazers:0Issues:0Issues:0

mlir-hello

MLIR Sample dialect

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

mlir-tutorial_cn

Hands-On Practical MLIR Tutorial

License:Apache-2.0Stargazers:0Issues:0Issues:0

muda

μ-Cuda, yet another painless cuda programming paradigm. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

MV2D

Code for "Object as Query: Lifting any 2D Object Detector to 3D Detection"

Stargazers:0Issues:0Issues:0

onnx-modifier

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

License:MITStargazers:0Issues:0Issues:0

pymlir

Python interface for MLIR - the Multi-Level Intermediate Representation

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

resource-stream

CUDA related news and material links

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

SHARK-Turbine

Unified compiler/runtime for interfacing with PyTorch Dynamo.

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

SST

Codes for “Fully Sparse 3D Object Detection” & “Embracing Single Stride 3D Object Detector with Sparse Transformer”

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

TensorRT

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.

License:Apache-2.0Stargazers:0Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

License:Apache-2.0Stargazers:0Issues:0Issues:0

torch-xla-SPMD

Pytorch/XLA SPMD Test code in Google TPU

License:MITStargazers:0Issues:0Issues:0

torchsparse

[MLSys'22] TorchSparse: Efficient Point Cloud Inference Engine

License:MITStargazers:0Issues:0Issues:0