VoyageWang's starred repositories

Paints-UNDO

Understand Human Behavior to Align True Needs

Language:PythonLicense:Apache-2.0Stargazers:2672Issues:0Issues:0

F-LMM

Code Release of F-LMM: Grounding Frozen Large Multimodal Models

Language:PythonLicense:NOASSERTIONStargazers:22Issues:0Issues:0

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonLicense:NOASSERTIONStargazers:1520Issues:0Issues:0

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonLicense:Apache-2.0Stargazers:3741Issues:0Issues:0

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Language:PythonLicense:Apache-2.0Stargazers:1871Issues:0Issues:0

STIC

Enhancing Large Vision Language Models with Self-Training on Image Comprehension.

Language:PythonLicense:Apache-2.0Stargazers:44Issues:0Issues:0

PhraseCutDataset

Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"

Language:Jupyter NotebookStargazers:99Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:709Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:25Issues:0Issues:0

MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

Language:Jupyter NotebookStargazers:307Issues:0Issues:0

fastcomposer

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

Language:PythonLicense:MITStargazers:626Issues:0Issues:0

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3097Issues:0Issues:0

bubogpt

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Language:PythonLicense:BSD-3-ClauseStargazers:491Issues:0Issues:0

imageinwords

Data release for the ImageInWords (IIW) paper.

Language:JavaScriptStargazers:184Issues:0Issues:0

pykan

Kolmogorov Arnold Networks

Language:Jupyter NotebookLicense:MITStargazers:13788Issues:0Issues:0

UESTC-Glasgow-Final-Year-Report-Template

电子科大格院毕设LaTeX模板

Language:TeXLicense:GPL-3.0Stargazers:13Issues:0Issues:0

Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Language:PythonLicense:Apache-2.0Stargazers:4218Issues:0Issues:0

minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Language:PythonLicense:MITStargazers:19458Issues:0Issues:0

Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

Language:PythonLicense:BSD-3-ClauseStargazers:1633Issues:0Issues:0

NExT-Chat

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Language:PythonLicense:Apache-2.0Stargazers:181Issues:0Issues:0

pyreft

ReFT: Representation Finetuning for Language Models

Language:PythonLicense:Apache-2.0Stargazers:957Issues:0Issues:0

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonStargazers:2231Issues:0Issues:0

Prompt-Highlighter

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Language:PythonLicense:MITStargazers:110Issues:0Issues:0

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3847Issues:0Issues:0

LOGO

Accepted by CVPR 2023

Language:PythonStargazers:32Issues:0Issues:0

NAE_CVPR2024

Accepted by CVPR 2024

Language:PythonLicense:MITStargazers:19Issues:0Issues:0

UniVS

Code release for "UniVS: Unified and Universal Video Segmentation with Prompts as Queries" (CVPR2024)

Language:PythonStargazers:149Issues:0Issues:0

DCI

Densely Captioned Images (DCI) dataset repository.

Language:PythonLicense:NOASSERTIONStargazers:148Issues:0Issues:0

RSMamba

This is the pytorch implement of the paper "RSMamba: Remote Sensing Image Classification with State Space Model"

Language:PythonLicense:Apache-2.0Stargazers:206Issues:0Issues:0

attention-map

🚀 Cross attention map tools for huggingface/diffusers

Language:PythonLicense:MITStargazers:82Issues:0Issues:0