DefTruth

DefTruth

Geek Repo

Company:@PaddlePaddle

Location:ShenZhen, China

Home Page:https://github.com/DefTruth

Github PK Tool:Github PK Tool

DefTruth's repositories

lite.ai.toolkit

🛠 A lite C++ toolkit of awesome AI models, support ONNXRuntime, MNN. Contains YOLOv5, YOLOv6, YOLOX, YOLOR, FaceDet, HeadSeg, HeadPose, Matting etc. Engine: ONNXRuntime, MNN.

Language:C++License:GPL-3.0Stargazers:3433Issues:71Issues:257

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

CUDA-Learn-Note

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaLicense:GPL-3.0Stargazers:435Issues:6Issues:0

statistic-learning-R-note

📒《统计学习方法-李航》学习笔记 200 页 PDF,各种手推公式细节讲解,包含详细的目录以及R语言代码实现,可结合《统计学习方法》提高学习效率,适合机器学习、深度学习初学者。

torchlm

💎A high level pipeline for face landmarks detection, it supports training, evaluating, exporting, inference(Python/C++) and 100+ data augmentations, can easily install via pip.

Language:PythonLicense:MITStargazers:217Issues:9Issues:24

flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Language:CudaLicense:Apache-2.0Stargazers:7Issues:0Issues:0

Awesome-SD-Inference

Awesome Stable Diffusion Inference

torch-tensorrt

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Language:PythonLicense:BSD-3-ClauseStargazers:3Issues:1Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:2Issues:1Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:2Issues:1Issues:0

CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

License:MITStargazers:1Issues:0Issues:0

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++License:NOASSERTIONStargazers:1Issues:1Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:1Issues:1Issues:0

LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

License:MITStargazers:1Issues:0Issues:0

TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications

Language:PythonLicense:MITStargazers:1Issues:0Issues:0
Stargazers:0Issues:0Issues:0

DeepCache

DeepCache: Accelerating Diffusion Models for Free

License:Apache-2.0Stargazers:0Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

License:NOASSERTIONStargazers:0Issues:0Issues:0

flash-linear-attention

Fast implementations of causal linear attention for autogressive language modeling (Pytorch)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

License:Apache-2.0Stargazers:0Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

License:Apache-2.0Stargazers:0Issues:0Issues:0

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

TensorRT

NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.

Language:C++License:Apache-2.0Stargazers:0Issues:1Issues:0

tensorrtllm_backend

The Triton TensorRT-LLM Backend

License:Apache-2.0Stargazers:0Issues:0Issues:0

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:0Issues:0Issues:0

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0