Ayush Shridhar's starred repositories
open-interpreter
A natural language interface for computers
llama_index
LlamaIndex is a data framework for your LLM applications
search_with_lepton
Building a quick conversation-based search demo with Lepton AI.
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
matmulfreellm
Implementation for MatMul-free LM.
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial