BonitoW's starred repositories

Image-Emotion-Datasets

The datasets for image emotion computing

Stargazers:17Issues:0Issues:0

MDDL

Dataset for "Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution" in IJCAI 17

Stargazers:22Issues:0Issues:0

VisualSketchpad

Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

License:Apache-2.0Stargazers:81Issues:0Issues:0

im2latex

Pytorch implemention of Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex

Language:PythonLicense:MITStargazers:173Issues:0Issues:0
Language:PythonStargazers:260Issues:0Issues:0

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonLicense:Apache-2.0Stargazers:41367Issues:0Issues:0

DDCOT

[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models

Language:PythonLicense:Apache-2.0Stargazers:30Issues:0Issues:0
Language:HTMLStargazers:61Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:8125Issues:0Issues:0

Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

Stargazers:286Issues:0Issues:0

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Language:PythonLicense:BSD-3-ClauseStargazers:25200Issues:0Issues:0

Grounding-DINO-1.5-API

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Language:PythonLicense:Apache-2.0Stargazers:652Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:10939Issues:0Issues:0

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4459Issues:0Issues:0

CODIS

Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".

Language:JavaScriptLicense:Apache-2.0Stargazers:7Issues:0Issues:0

mPLUG-Owl

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

Language:PythonLicense:MITStargazers:2039Issues:0Issues:0

Semantic-SAM

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Language:PythonStargazers:2163Issues:0Issues:0

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:9318Issues:0Issues:0

Semantic-Segment-Anything

Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).

Language:PythonLicense:Apache-2.0Stargazers:2062Issues:0Issues:0

CCoT

[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"

Language:PythonLicense:MITStargazers:47Issues:0Issues:0

SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Language:PythonLicense:NOASSERTIONStargazers:285Issues:0Issues:0

ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Language:PythonLicense:AGPL-3.0Stargazers:26753Issues:0Issues:0

LLaVA-Interactive-Demo

LLaVA-Interactive-Demo

Language:PythonLicense:Apache-2.0Stargazers:340Issues:0Issues:0

Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Language:PythonLicense:Apache-2.0Stargazers:4237Issues:0Issues:0

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:14350Issues:0Issues:0

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:45854Issues:0Issues:0

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonLicense:MITStargazers:3598Issues:0Issues:0

vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Language:PythonLicense:MITStargazers:481Issues:0Issues:0

ABigSurvey

A collection of 1000+ survey papers on Natural Language Processing (NLP) and Machine Learning (ML).

License:GPL-3.0Stargazers:1970Issues:0Issues:0