Beast code in Giters

Yichi Zhang's repositories

FastV

Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Language:Python000

Grounded-Segment-Anything

Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.0000

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION000

LLaVA_decoding

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.0000

MathVerse

Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Language:PythonMIT000

OCR_image

Language:Python000

PCA-EVAL

PCA-EVAL benchmark proposed in paper "Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond"

000

recommenders

Best Practices on Recommendation Systems

Language:PythonMIT000