clownrat6

[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting."

Language:Python2700

datacomp

DataComp: In search of the next generation of multimodal datasets

Language:PythonNOASSERTION62700

ego4d-goalstep

Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)

Language:PythonMIT3300

common_metrics_on_video_quality

You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.

Language:Python17700

textgrad

TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.

Language:PythonMIT138800

LongVA

Long Context Transfer from Language to Vision

Language:PythonApache-2.026600

VideoHallucer

VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)

Language:PythonMIT1700

tiny-diffusion

A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.

Language:Jupyter Notebook60900

LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF

Language:PythonGPL-3.029100

videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Language:PythonApache-2.015000

Math-LLaVA

Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Language:PythonApache-2.04900

VoCo-LLaMA

VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Language:PythonApache-2.06700

Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

Language:PythonApache-2.013600

WebDesignAgent

WebDesignAgent : Towards Effortless Website Creation

Language:PythonApache-2.022500

Recap-DataComp-1B

This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"

11000

medmcqa

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

Language:Jupyter NotebookMIT16000

OmniTokenizer

OmniTokenizer: one model and one weight for image-video joint tokenization.

Language:PythonMIT21100