Kebe's starred repositories
MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
gpu-optimization-workshop
Slides, notes, and materials for the workshop
stable-diffusion-webui-distributed
Chains stable-diffusion-webui instances together to facilitate faster image generation.
llama3-from-scratch
llama3 implementation one matrix multiplication at a time
LLMBook-zh.github.io
《大语言模型》作者:赵鑫,李军毅,周昆,唐天一,文继荣
HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
inpaint-web
A free and open-source inpainting & image-upscaling tool powered by webgpu and wasm on the browser。| 基于 Webgpu 技术和 wasm 技术的免费开源 inpainting & image-upscaling 工具, 纯浏览器端实现。
hai-platform
一种任务级GPU算力分时调度的高性能深度学习训练平台
pyelftools
Parsing ELF and DWARF in Python