VoyageWang's starred repositories
Paints-UNDO
Understand Human Behavior to Align True Needs
direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
PhraseCutDataset
Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"
MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
fastcomposer
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
imageinwords
Data release for the ImageInWords (IIW) paper.
UESTC-Glasgow-Final-Year-Report-Template
电子科大格院毕设LaTeX模板
Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
NAE_CVPR2024
Accepted by CVPR 2024
attention-map
🚀 Cross attention map tools for huggingface/diffusers