ljss's repositories
flash-attention
Fast and memory-efficient exact attention
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Language:PythonBSD-3-Clause000
fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
Language:PythonBSD-3-Clause000
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:PythonApache-2.0000
Language:PythonApache-2.0000