Xiao Feng Zhang (zhangbaijin)

zhangbaijin

Geek Repo

Company:PhD @ SJTU

Location:Shang Hai

Home Page:zhangbaijin.github.io

Github PK Tool:Github PK Tool

Xiao Feng Zhang's starred repositories

DroneVehicle

Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning

Stargazers:404Issues:0Issues:0
Language:PythonLicense:MITStargazers:31Issues:0Issues:0

MKT

Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".

Language:PythonLicense:MITStargazers:112Issues:0Issues:0

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3035Issues:0Issues:0

FastV

Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Language:PythonStargazers:143Issues:0Issues:0

LLaVA-PruMerge

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Language:PythonLicense:Apache-2.0Stargazers:50Issues:0Issues:0

HALC

[ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"

Language:PythonLicense:MITStargazers:46Issues:0Issues:0

LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Language:PythonLicense:Apache-2.0Stargazers:602Issues:0Issues:0

Prompt-Highlighter

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Language:PythonLicense:MITStargazers:103Issues:0Issues:0

sam-clip-segmentation

Image Instance Segmentation - Zero Shot - OpenAI's CLIP + Meta's SAM

Language:Jupyter NotebookStargazers:46Issues:0Issues:0

Diff-Plugin

[CVPR 2024] Official code release of our paper "Diff-Plugin: Revitalizing Details for Diffusion-based Low-level tasks"

Language:PythonStargazers:68Issues:0Issues:0

ResUNetFormer

This Keras code is for the paper A. Jamali, S. K. Roy, J. Li and P. Ghamisi, "[Neighborhood Attention Makes the Encoder of ResUNet Stronger for Accurate Road Extraction]," in IEEE Geoscience and Remote Sensing Letters, doi: 10.1109/LGRS.2024.3354560 [https://ieeexplore.ieee.org/document/10400502].

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:10Issues:0Issues:0

RCFSNet

Road extraction from satellite imagery

Language:PythonStargazers:35Issues:0Issues:0

Awesome-MLLM-Hallucination

Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)

Stargazers:27Issues:0Issues:0

Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

Language:Jupyter NotebookLicense:MITStargazers:1686Issues:0Issues:0

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Language:PythonLicense:Apache-2.0Stargazers:3691Issues:0Issues:0

honeybee

Official implementation of project Honeybee (CVPR 2024)

Language:PythonLicense:NOASSERTIONStargazers:381Issues:0Issues:0
Language:PythonStargazers:22Issues:0Issues:0

DDCOT

[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models

Language:PythonLicense:Apache-2.0Stargazers:28Issues:0Issues:0

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonLicense:Apache-2.0Stargazers:1756Issues:0Issues:0

InstructIR

InstructIR: High-Quality Image Restoration Following Human Instructions https://huggingface.co/spaces/marcosv/InstructIR

Language:Jupyter NotebookLicense:MITStargazers:409Issues:0Issues:0

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonLicense:Apache-2.0Stargazers:5949Issues:0Issues:0

label-words-are-anchors

Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

Language:PythonLicense:MITStargazers:120Issues:0Issues:0

MechanisticProbe

Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

Language:PythonLicense:MITStargazers:10Issues:0Issues:0

Awesome-LLM-Reasoning

Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

License:MITStargazers:1160Issues:0Issues:0

awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

License:MITStargazers:348Issues:0Issues:0

XrayGLM

🩺 首个会看胸部X光片的中文多模态医学大模型 | The first Chinese Medical Multimodal Model that Chest Radiographs Summarization.

Language:PythonLicense:NOASSERTIONStargazers:819Issues:0Issues:0

KwaiAgents

A generalized information-seeking agent system with Large Language Models (LLMs).

Language:PythonLicense:NOASSERTIONStargazers:977Issues:0Issues:0

Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Language:PythonLicense:Apache-2.0Stargazers:700Issues:0Issues:0