Zhixing Sun's repositories
2024-AAAI-HPT
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)
Agent-Attention
Official repository of Agent Attention
AttriCLIP
CVPR2023: AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning
BiDistFSCIL
Official implementation of CVPR 2023 paper Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation.
code-samples
Holds code for our CVPR'23 tutorial: All Things ViTs: Understanding and Interpreting Attention in Vision.
FGVP
Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023
FLatten-Transformer
Official repository of FLatten Transformer (ICCV2023)
GraphRAG-Local-UI
GraphRAG using Local LLMs - Features robust API and multiple apps for Indexing/Prompt Tuning/Query/Chat/Visualizing/Etc. This is meant to be the ultimate GraphRAG/KG local LLM app.
IELT
Source code of the paper Fine-Grained Visual Classification via Internal Ensemble Learning Transformer
LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2
Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
multimodal-prompt-learning
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
Oryx
MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
ovsam
[arXiv preprint] The official code of paper "Open-Vocabulary SAM".
Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
recognize-anything
Code for the Recognize Anything Model (RAM) and Tag2Text Model
RevisitingCIL
The code repository for "Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need" in PyTorch.
SHIP
Official code for ICCV 2023 paper, "Improving Zero-Shot Generalization for CLIP with Synthesized Prompts"
some_useful_python_program
some useful python program
sunhongbo.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
Vitron
A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing