Yichi Zhang's repositories
FastV
Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Grounded-Segment-Anything
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
LLaMA2-Accessory
An Open-source Toolkit for LLM Development
LLaVA_decoding
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
MathVerse
Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
PCA-EVAL
PCA-EVAL benchmark proposed in paper "Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond"
recommenders
Best Practices on Recommendation Systems