There are 1 repository under triton topic.
Efficient Triton Kernels for LLM Training
🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.
A service for autodiscovery and configuration of applications running in containers
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series
LLVM based static binary analysis framework
A performance library for machine learning applications.
Ozoz dotfiles for bspwm, i3WM
ClearML - Model-Serving Orchestration and Repository Solution
Resources About Dynamic Binary Instrumentation and Dynamic Binary Analysis
NVIDIA-accelerated, deep learned model support for image space object detection
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Deploy DL/ ML inference pipelines with minimal extra code.
Triton implementation of FlashAttention2 that adds Custom Masks.
Three examples of recommendation system pipelines with NVIDIA Merlin and Redis
:whale: Scripps Whale Acoustics Lab :earth_americas: Scripps Acoustic Ecology Lab - Triton with remoras in development
A step-by-step guide to setting up Nvidia GPUs with CUDA support running on Docker (and Compose) containers on NixOS host
⚡ Blazing fast audio augmentation in Python, powered by GPU for high-efficiency processing in machine learning and audio analysis tasks.
COIN Attacks: on Insecurity of Enclave Untrusted Interfaces in SGX - ASPLOS 2020
Symbolic debugging tool using JonathanSalwan/Triton
Adaptive Callsite-sensitive Control Flow Integrity - EuroS&P'19
Increase the inference speed of the model
Standalone static version of Triton's x86/x64 translator
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.