GeneZC's starred repositories
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Campus2025
2025届互联网校招信息汇总
ring-flash-attention
Ring attention implementation with flash attention
arena-hard-auto
Arena-Hard-Auto: An automatic LLM benchmark.
long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly"
Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
SiMT-Hallucination
source code of paper "On the Hallucination in Simultaneous Machine Translation"