Vibashan

followers

following

stars

Johns Hopkins University

Maryland, USA

https://vibashan.github.io/

Vibashan VS's starred repositories

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonGPL-3.04191 38 414

jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

Language:PythonNOASSERTION2594 37 52

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonApache-2.02411 41 376

Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)

Language:PythonMIT1436 20 285

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.01263 29 149

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.0882 19 68

Neural-Network-Diffusion

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Language:Python770 18 18

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Language:PythonApache-2.0561 12 64

DriveAGI

[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System

Language:PythonApache-2.0517 27 7

Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Language:Python480 11 44

prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Language:PythonMIT392 12 35

GiT

[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

Language:PythonApache-2.0267 6 8

ChartVLM

Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Language:PythonCC-BY-4.0202 13 14

InternVideo2

facexformer

Official implementation of FaceXFormer: A Unified Transformer for Facial Analysis

Language:PythonMIT178 10 16

intrinsic-lora

Official repo of 𝙄𝙣𝙩𝙧𝙞𝙣𝙨𝙞𝙘 𝙇𝙤𝙍𝘼: 𝘼 𝙂𝙚𝙣𝙚𝙧𝙖𝙡𝙞𝙨𝙩 𝘼𝙥𝙥𝙧𝙤𝙖𝙘𝙝 𝙛𝙤𝙧 𝘿𝙞𝙨𝙘𝙤𝙫𝙚𝙧𝙞𝙣𝙜 𝙆𝙣𝙤𝙬𝙡𝙚𝙙𝙜𝙚 𝙞𝙣 𝙂𝙚𝙣𝙚𝙧𝙖𝙩𝙞𝙫𝙚 𝙈𝙤𝙙𝙚𝙡𝙨, which is previously titled (𝘎𝘦𝘯𝘦𝘳𝘢𝘵𝘪𝘷𝘦 𝘔𝘰𝘥𝘦𝘭𝘴: 𝘞𝘩𝘢𝘵 𝘥𝘰 𝘵𝘩𝘦𝘺 𝘬𝘯𝘰𝘸? 𝘋𝘰 𝘵𝘩𝘦𝘺 𝘬𝘯𝘰𝘸 𝘵𝘩𝘪𝘯𝘨𝘴? 𝘓𝘦𝘵'𝘴 𝘧𝘪𝘯𝘥 𝘰𝘶𝘵!)

Language:PythonNOASSERTION174 4 7

Visual-Instruction-Tuning

SVIT: Scaling up Visual Instruction Tuning

Language:PythonMIT159 5 15

UniVS

Code release for "UniVS: Unified and Universal Video Segmentation with Prompts as Queries" (CVPR2024)

Language:Python154 4 10

Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Language:PythonApache-2.0142 8 27

bpeasy

Fast bare-bones BPE for modern tokenizer training

Language:PythonMIT135 20

TPT

Test-time Prompt Tuning (TPT) for zero-shot generalization in vision-language models (NeurIPS 2022))

Language:PythonMIT130 3 15

ELM

[ECCV 2024] Embodied Understanding of Driving Scenarios

Language:Python127 8 15

DreamLIP

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

Language:PythonNOASSERTION78 8 8

StructLM

Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)

Language:PythonMIT66 4 15

PosSAM

Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything

OmniVid

Language:Python28 4 2

PIN

Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

Language:PythonNOASSERTION22 9 3

ConTextual

Language:Python22 10

MAD

Language:PythonNOASSERTION10 3 1

ZeroGen

[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation

Language:Python9 1 3