joez17

Joez's starred repositories

Event-Bench

Official code of *Towards Event-oriented Long Video Understanding*

Language:Python300

VideoNIAH

VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs

Language:Python1800

ChatTTS

A generative speech model for daily dialogue.

Language:PythonNOASSERTION2786000

SC-Tune

Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"

Language:PythonMIT1500

IVG

This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions", which is accepted by ACL 2024 (Findings).

Apache-2.01500

fromage

🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".

Language:Jupyter NotebookApache-2.046700

This repo provides the official code for : 1) TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/abs/2103.04430) , accepted by MICCAI2021. 2) TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images(https://arxiv.org/abs/2201.12785).

Language:PythonApache-2.037600

MRES

This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation", accepted by CVPR 2024.

Apache-2.05900

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.012939600

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Language:PythonBSD-3-Clause2515700

Heterformer

Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks (KDD 2023)

Language:Jupyter NotebookApache-2.01800

Edgeformers

Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks (ICLR 2023)

Language:PythonApache-2.05300

Awesome-Language-Model-on-Graphs

A curated list of papers and resources based on "Large Language Models on Graphs: A Comprehensive Survey".

MIT63200

OPT_Questioner

Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"

Language:PythonMIT1400

COSA

Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Language:PythonMIT3700

VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Language:Jupyter NotebookMIT21900

ChatBridge

ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.

Language:PythonBSD-3-Clause4200

VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Language:PythonMIT24800

open_flamingo

An open-source framework for training large multimodal models.

Language:PythonMIT357100

open-images-downloader

Language:Python1200

SVF-few-shot-segmentation

Language:Python2300

arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Language:PythonApache-2.0503300