Gary Gege (Gary-code)

Gary-code

Geek Repo

Company:SCUT

Location:Guangzhou, China

Home Page:https://gary-code.github.io/

Github PK Tool:Github PK Tool

Gary Gege's starred repositories

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:18423Issues:158Issues:1418

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:MITStargazers:11044Issues:163Issues:224

llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

Language:HTMLLicense:Apache-2.0Stargazers:8175Issues:78Issues:21

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonLicense:Apache-2.0Stargazers:4477Issues:52Issues:141

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4455Issues:49Issues:401

dive-into-llms

《动手学大模型Dive into LLMs》系列编程实践教程

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonLicense:AGPL-3.0Stargazers:2592Issues:46Issues:0

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonLicense:Apache-2.0Stargazers:1864Issues:23Issues:84

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:1290Issues:25Issues:62

OMG-Seg

OMG-LLaVA and OMG-Seg codebase

Language:PythonLicense:NOASSERTIONStargazers:1168Issues:23Issues:26

VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:715Issues:13Issues:71

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

License:MITStargazers:256Issues:6Issues:0

OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Language:PythonLicense:MITStargazers:221Issues:2Issues:34

RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

LLM-generated-Text-Detection

A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, current issues and future directions.

Stargazers:151Issues:0Issues:0

QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models

InCTRL

Official implementation of CVPR'24 paper 'Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts'.

Language:PythonLicense:Apache-2.0Stargazers:87Issues:2Issues:26

LearnDeepSpeed

DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)

Language:PythonLicense:MITStargazers:74Issues:1Issues:0

Pink

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

text-to-cad-ui

A lightweight UI for interfacing with the Zoo text-to-cad API, built with SvelteKit.

Language:SvelteLicense:MITStargazers:60Issues:8Issues:33

AnyDoor

AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models

C-VQA

Counterfactual Reasoning VQA Dataset

Sniffer

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

License:Apache-2.0Stargazers:13Issues:1Issues:0
Language:PythonStargazers:10Issues:1Issues:0

UNTL

EMNLP'2022: Unsupervised Non-transferable Text Classification

Language:PythonLicense:MITStargazers:9Issues:3Issues:0

Easy_LLM_Tool

简单的大模型微调工具包

D-VQG

[TCSVT 2024] The released code of paper "Video Question Generation for Dynamic Changes"