zhangbaijin

Xiao Feng Zhang's starred repositories

DroneVehicle

Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning

40400

MKT

Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".

Language:PythonMIT11200

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.0303500

FastV

Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Language:Python14300

LLaVA-PruMerge

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Language:PythonApache-2.05000

HALC

[ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"

Language:PythonMIT4600

LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Language:PythonApache-2.060200

Prompt-Highlighter

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Language:PythonMIT10300

sam-clip-segmentation

Image Instance Segmentation - Zero Shot - OpenAI's CLIP + Meta's SAM

Language:Jupyter Notebook4600

Diff-Plugin

[CVPR 2024] Official code release of our paper "Diff-Plugin: Revitalizing Details for Diffusion-based Low-level tasks"

Language:Python6800

This Keras code is for the paper A. Jamali, S. K. Roy, J. Li and P. Ghamisi, "[Neighborhood Attention Makes the Encoder of ResUNet Stronger for Accurate Road Extraction]," in IEEE Geoscience and Remote Sensing Letters, doi: 10.1109/LGRS.2024.3354560 [https://ieeexplore.ieee.org/document/10400502].

Language:Jupyter NotebookApache-2.01000

RCFSNet

Road extraction from satellite imagery

Language:Python3500

Awesome-MLLM-Hallucination

Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)

2700

Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

Language:Jupyter NotebookMIT168600