foreverpiano's starred repositories
attention-gym
Helpful tools and examples for working with flex-attention
PKU-Auto-Reservation
华清大学自动预约入校
zotero-style
Ethereal Style for Zotero
cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
SEED-Story
SEED-Story: Multimodal Long Story Generation with Large Language Model
LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
ShareGPT4Video
An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Awesome-Efficient-Diffusion-Models
Paper survey of efficient computation for large scale models.
stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.