flash-attention-2

There are 0 repository under flash-attention-2 topic.

DefTruth / Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
awesome-llm deepseek flash-attention flash-attention-2 llm llm-inference llms open-sora paged-attention sora streaming-llm tensorrt-llm vllm
1513
DefTruth / CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
block-reduce cuda cuda-kernels cuda-programming elementwise flash-attention flash-attention-2 gemm gemv layernorm rmsnorm softmax warp-reduce
Language:Cuda 606
arihanv / Shush
Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
huggingface-transformers machine-learning modal transcription whisper shadcn-ui flash-attention-2
Language:TypeScript 135
BBC-Esq / WhisperS2T-transcriber
Uses the powerful WhisperS2T and Ctranslate2 libraries to batch transcribe multiple files
audio-recorder audio-recording audio-transcribing audio-transcription ctranslate2 flash-attention-2 transcr transcriber transcription whispers2t
Language:Python 4
graphcore-research / flash-attention-ipu
Poplar implementation of FlashAttention for IPU
deep-learning flash-attention flash-attention-2 graphcore poplar transformers ipu pytorch
Language:C++ 2

DefTruth / Awesome-LLM-Inference