66RING's repositories
tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
LongShortTokenDecoding
Long short token decoding speed up 4x for long context LLM. A hundred lines of core code. Open source for learning.
ring-attention-pytorch
tiny ring attention implement for learning purpose
66RING.github.io
https://66ring.github.io/
Counting-Stars-Local
Counting-Stars scripts for evaluating local llm.
LLMTest_NeedleInAHaystack-Local
run Needle In A Haystack with local LLM. check the makefile
pytorch-cuda-binding-tutorial
Tutorial for building a custom CUDA and C function for torch
15445-bootcamp
A basic introduction to coding in modern C++.
academicpages.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
bufferline.nvim
A snazzy bufferline for Neovim
clash-verge
A Clash GUI based on tauri. Supports Windows, macOS and Linux.
ContinuousBatching
A demo about continuous batching, which is simple than you think.
flash-attention
Fast and memory-efficient exact attention
LightSeq
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
llama-playground
play with llama
RULER
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
ThunderKittens
Tile primitives for speedy kernels
vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
zephyr-nvim
Customized nvimdev/zephyr-nvim