Zechen Bai (JosephPai)

JosephPai

Geek Repo

Company:NUS

Location:Singapore

Home Page:www.baizechen.site

Twitter:@ZechenBai

Github PK Tool:Github PK Tool

Zechen Bai's starred repositories

VidProM

[NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Stargazers:103Issues:0Issues:0

VideoLISA

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Stargazers:13Issues:0Issues:0

Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

Stargazers:152Issues:0Issues:0

AdaSlot

Official implementation of the CVPR'24 paper [Adaptive Slot Attention: Object Discovery with Dynamic Slot Number]

Language:PythonLicense:Apache-2.0Stargazers:20Issues:0Issues:0

Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Language:PythonLicense:Apache-2.0Stargazers:882Issues:0Issues:0

AI-Scientist

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:7690Issues:0Issues:0

MINT-1T

MINT-1T: A one trillion token multimodal interleaved dataset.

Stargazers:740Issues:0Issues:0

sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:11296Issues:0Issues:0

SOLO

Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:105Issues:0Issues:0

fucking-algorithm

刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.

Language:MarkdownStargazers:125261Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:152Issues:0Issues:0
Language:MATLABLicense:GPL-3.0Stargazers:10375Issues:0Issues:0

SliME

✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Language:PythonLicense:Apache-2.0Stargazers:132Issues:0Issues:0

enhancing-transformers

An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch

Language:PythonLicense:MITStargazers:280Issues:0Issues:0

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonLicense:Apache-2.0Stargazers:1704Issues:0Issues:0

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonLicense:NOASSERTIONStargazers:1778Issues:0Issues:0

Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Language:PythonLicense:Apache-2.0Stargazers:631Issues:0Issues:0

lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Language:PythonLicense:NOASSERTIONStargazers:1405Issues:0Issues:0

Awesome-World-Model

Collect some World Models for Autonomous Driving papers.

Stargazers:426Issues:0Issues:0

gpt-computer-assistant

Intelligence development framework in python for your product like Apple Intelligence

Language:PythonLicense:MITStargazers:5208Issues:0Issues:0

schedule_free

Schedule-Free Optimization in PyTorch

Language:PythonLicense:Apache-2.0Stargazers:1835Issues:0Issues:0

World-Models-Autonomous-Driving-Latest-Survey

A curated list of world models for autonomous driving. Keep updated.

Stargazers:153Issues:0Issues:0

supervision

We write your reusable computer vision tools. 💜

Language:PythonLicense:MITStargazers:22825Issues:0Issues:0

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonLicense:MITStargazers:2037Issues:0Issues:0

HALC

[ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"

Language:PythonLicense:MITStargazers:66Issues:0Issues:0

Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

Stargazers:389Issues:0Issues:0

LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Language:PythonStargazers:798Issues:0Issues:0

XMem2

A tool for efficient semi-supervised video object segmentation (great results with minimal manual labor) and a dataset for benchmarking

Language:PythonLicense:GPL-3.0Stargazers:174Issues:0Issues:0

Tracking-Anything-with-DEVA

[ICCV 2023] Tracking Anything with Decoupled Video Segmentation

Language:PythonLicense:NOASSERTIONStargazers:1234Issues:0Issues:0

mergekit

Tools for merging pretrained large language models.

Language:PythonLicense:LGPL-3.0Stargazers:4583Issues:0Issues:0