Vibashan VS (Vibashan)

Vibashan

Geek Repo

Company:Johns Hopkins University

Location:Maryland, USA

Home Page:https://vibashan.github.io/

Github PK Tool:Github PK Tool

Vibashan VS's starred repositories

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonLicense:GPL-3.0Stargazers:4191Issues:38Issues:414

jepa

PyTorch code and models for V-JEPA self-supervised learning from video.

Language:PythonLicense:NOASSERTIONStargazers:2594Issues:37Issues:52

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonLicense:Apache-2.0Stargazers:2411Issues:41Issues:376

Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)

Language:PythonLicense:MITStargazers:1436Issues:20Issues:285

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonLicense:Apache-2.0Stargazers:1263Issues:29Issues:149

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonLicense:Apache-2.0Stargazers:882Issues:19Issues:68

Neural-Network-Diffusion

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Language:PythonLicense:Apache-2.0Stargazers:561Issues:12Issues:64

DriveAGI

[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System

Language:PythonLicense:Apache-2.0Stargazers:517Issues:27Issues:7

Panda-70M

[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Language:PythonLicense:MITStargazers:392Issues:12Issues:35

GiT

[ECCV2024 OralπŸ”₯] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

Language:PythonLicense:Apache-2.0Stargazers:267Issues:6Issues:8

ChartVLM

Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Language:PythonLicense:CC-BY-4.0Stargazers:202Issues:13Issues:14

facexformer

Official implementation of FaceXFormer: A Unified Transformer for Facial Analysis

Language:PythonLicense:MITStargazers:178Issues:10Issues:16

intrinsic-lora

Official repo of π™„π™£π™©π™§π™žπ™£π™¨π™žπ™˜ π™‡π™€π™π˜Ό: 𝘼 π™‚π™šπ™£π™šπ™§π™–π™‘π™žπ™¨π™© 𝘼π™₯π™₯π™§π™€π™–π™˜π™ 𝙛𝙀𝙧 π˜Ώπ™žπ™¨π™˜π™€π™«π™šπ™§π™žπ™£π™œ π™†π™£π™€π™¬π™‘π™šπ™™π™œπ™š π™žπ™£ π™‚π™šπ™£π™šπ™§π™–π™©π™žπ™«π™š π™ˆπ™€π™™π™šπ™‘π™¨, which is previously titled (𝘎𝘦𝘯𝘦𝘳𝘒𝘡π˜ͺ𝘷𝘦 π˜”π˜°π˜₯𝘦𝘭𝘴: 𝘞𝘩𝘒𝘡 π˜₯𝘰 𝘡𝘩𝘦𝘺 𝘬𝘯𝘰𝘸? π˜‹π˜° 𝘡𝘩𝘦𝘺 𝘬𝘯𝘰𝘸 𝘡𝘩π˜ͺ𝘯𝘨𝘴? π˜“π˜¦π˜΅'𝘴 𝘧π˜ͺ𝘯π˜₯ 𝘰𝘢𝘡!)

Language:PythonLicense:NOASSERTIONStargazers:174Issues:4Issues:7

Visual-Instruction-Tuning

SVIT: Scaling up Visual Instruction Tuning

Language:PythonLicense:MITStargazers:159Issues:5Issues:15

UniVS

Code release for "UniVS: Unified and Universal Video Segmentation with Prompts as Queries" (CVPR2024)

Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Language:PythonLicense:Apache-2.0Stargazers:142Issues:8Issues:27

bpeasy

Fast bare-bones BPE for modern tokenizer training

Language:PythonLicense:MITStargazers:135Issues:2Issues:0

TPT

Test-time Prompt Tuning (TPT) for zero-shot generalization in vision-language models (NeurIPS 2022))

Language:PythonLicense:MITStargazers:130Issues:3Issues:15

ELM

[ECCV 2024] Embodied Understanding of Driving Scenarios

DreamLIP

[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions

Language:PythonLicense:NOASSERTIONStargazers:78Issues:8Issues:8

StructLM

Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)

Language:PythonLicense:MITStargazers:66Issues:4Issues:15

PosSAM

Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything

PIN

Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

Language:PythonLicense:NOASSERTIONStargazers:22Issues:9Issues:3
Language:PythonStargazers:22Issues:1Issues:0
Language:PythonLicense:NOASSERTIONStargazers:10Issues:3Issues:1

ZeroGen

[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation