Fuxiao Liu's repositories
LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
VisualNews-Repository
[EMNLP'21] Visual News: Benchmark and Challenges in News Image Captioning
DocumentCLIP
[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Twitter-Video-dataset
[EACL'23] COVID-VTS: Fact Extraction and Verification on Short Video Platforms
HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
awesome-Large-MultiModal-Hallucination
😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
calvinliu123.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
GoodNews
Good News Everyone! - CVPR 2019
LLaVA
Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
M3Exam
Data and code for paper "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models"
MiniGPT-4
MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
open_clip
An open source implementation of CLIP.
self-instruct
Aligning pretrained language models with instruction data generated by themselves.
TCP
[NeurIPS 2022] Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline.
tool4ipp
This repository contains a data conversion tool for Image Position Prediction task proposed in our paper
VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)