Zirui Song's starred repositories

AMBER

An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation

Language:PythonLicense:Apache-2.0Stargazers:92Issues:0Issues:0

OPARL

OPARL(Optimistic and Pessimistic Actor in RL)

Language:PythonLicense:MITStargazers:19Issues:0Issues:0

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonLicense:NOASSERTIONStargazers:2707Issues:0Issues:0

llm-hallucination-survey

Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models"

Stargazers:932Issues:0Issues:0

POPE

The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''

Language:PythonLicense:MITStargazers:176Issues:0Issues:0

CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language:PythonLicense:Apache-2.0Stargazers:6030Issues:0Issues:0

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:3981Issues:0Issues:0

Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Language:PythonStargazers:240Issues:0Issues:0

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:PythonStargazers:766Issues:0Issues:0

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:2774Issues:0Issues:0

LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Language:PythonLicense:Apache-2.0Stargazers:724Issues:0Issues:0

OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Language:PythonLicense:NOASSERTIONStargazers:576Issues:0Issues:0

typst

A new markup-based typesetting system that is powerful and easy to learn.

Language:RustLicense:Apache-2.0Stargazers:34577Issues:0Issues:0

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Language:PythonLicense:Apache-2.0Stargazers:3794Issues:0Issues:0

ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Language:ShellLicense:Apache-2.0Stargazers:25483Issues:0Issues:0

InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Language:PythonLicense:MITStargazers:2514Issues:0Issues:0
License:MITStargazers:131Issues:0Issues:0

WeChatMsg

提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手

Language:PythonLicense:GPL-3.0Stargazers:34120Issues:0Issues:0

BenchLMM

[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Language:PythonLicense:Apache-2.0Stargazers:81Issues:0Issues:0

SoM

Set-of-Mark Prompting for GPT-4V and LMMs

Language:PythonLicense:MITStargazers:1149Issues:0Issues:0

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonLicense:Apache-2.0Stargazers:700Issues:0Issues:0

Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language:PythonLicense:Apache-2.0Stargazers:2943Issues:0Issues:0

LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonLicense:MITStargazers:707Issues:0Issues:0

MAPLM

[CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding

Language:PythonLicense:MITStargazers:95Issues:0Issues:0

JudgeLM

An open-sourced LLM judge for evaluating LLM-generated answers.

Language:PythonLicense:Apache-2.0Stargazers:307Issues:0Issues:0

MFTCoder

High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs. This work has been accepted by KDD 2024.

Language:PythonLicense:NOASSERTIONStargazers:627Issues:0Issues:0

Yi

A series of large language models trained from scratch by developers @01-ai

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:7652Issues:0Issues:0

GVIL

Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"

Language:PythonStargazers:10Issues:0Issues:0

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:9835Issues:0Issues:0

Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonLicense:MITStargazers:3561Issues:0Issues:0