ZiruiSongBest

Zirui Song's starred repositories

AMBER

An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation

Language:PythonApache-2.09200

OPARL

OPARL(Optimistic and Pessimistic Actor in RL)

Language:PythonMIT1900

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION270700

llm-hallucination-survey

Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models"

93200

POPE

The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''

Language:PythonMIT17600

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonApache-2.0603000

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonApache-2.0398100

Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Language:Python24000

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:Python76600

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonBSD-3-Clause277400

LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Language:PythonApache-2.072400

OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Language:PythonNOASSERTION57600

typst

A new markup-based typesetting system that is powerful and easy to learn.

Language:RustApache-2.03457700

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Language:PythonApache-2.0379400

ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Language:ShellApache-2.02548300

InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Language:PythonMIT251400

LVIS-INSTRUCT4V

MIT13100

WeChatMsg

提取微信聊天记录，将其导出成HTML、Word、Excel文档永久保存，对聊天记录进行分析生成年度聊天报告，用聊天数据训练专属于个人的AI聊天助手

Language:PythonGPL-3.03412000

BenchLMM

[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Language:PythonApache-2.08100

SoM

Set-of-Mark Prompting for GPT-4V and LMMs

Language:PythonMIT114900

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonApache-2.070000

Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language:PythonApache-2.0294300

LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonMIT70700

MAPLM

[CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding

Language:PythonMIT9500

JudgeLM

An open-sourced LLM judge for evaluating LLM-generated answers.

Language:PythonApache-2.030700

MFTCoder

High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.

Language:PythonNOASSERTION62700

Yi

A series of large language models trained from scratch by developers @01-ai

Language:Jupyter NotebookApache-2.0765200

GVIL

Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"

Language:Python1000

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookBSD-3-Clause983500

Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonMIT356100