Gary-code

followers

following

stars

SCUT

Guangzhou, China

https://gary-code.github.io/

Gary Gege's starred repositories

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.018423 158 1418

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonMIT11044 163 224

llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

Language:HTMLApache-2.08175 78 21

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonApache-2.04477 52 141

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION4455 49 401

dive-into-llms

《动手学大模型Dive into LLMs》系列编程实践教程

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonAGPL-3.02592 460

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonApache-2.01864 23 84

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION1290 25 62

OMG-Seg

OMG-LLaVA and OMG-Seg codebase

Language:PythonNOASSERTION1168 23 26

VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonApache-2.0715 13 71

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:Python700 29 64

LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

Language:Python494 6 41

KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

MIT256 60

OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Language:PythonMIT221 2 34

RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Language:Python197 2 23

LLM-generated-Text-Detection

A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, current issues and future directions.

QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models

Language:Python122 6 4

InCTRL

Official implementation of CVPR'24 paper 'Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts'.

Language:PythonApache-2.087 2 26

LearnDeepSpeed

DeepSpeed教程 & 示例注释 & 学习笔记（大模型高效训练）

Language:PythonMIT74 10

Pink

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

Language:Python66 4 6

text-to-cad-ui

A lightweight UI for interfacing with the Zoo text-to-cad API, built with SvelteKit.

Language:SvelteMIT60 8 33

AnyDoor

AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models

Language:Python34 6 1

LM-Science-Tutor

Language:Python29 6 1

C-VQA

Counterfactual Reasoning VQA Dataset

Language:Python21 3 1

Sniffer

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

Apache-2.013 10

CUTI-Domain

Language:Python10 10

UNTL

EMNLP'2022: Unsupervised Non-transferable Text Classification

Language:PythonMIT9 30

Easy_LLM_Tool

简单的大模型微调工具包

Language:Python4 2 1

D-VQG

[TCSVT 2024] The released code of paper "Video Question Generation for Dynamic Changes"

Language:Python1 1 1