Gary Gege's starred repositories
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
dive-into-llms
《动手学大模型Dive into LLMs》系列编程实践教程
PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
LLM-generated-Text-Detection
A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, current issues and future directions.
LearnDeepSpeed
DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)
text-to-cad-ui
A lightweight UI for interfacing with the Zoo text-to-cad API, built with SvelteKit.
Easy_LLM_Tool
简单的大模型微调工具包