Arking1995's starred repositories

LLaVA-1.6-ft

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:27Issues:0Issues:0

kubric

A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2247Issues:0Issues:0

conceptual-captions

Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.

Language:ShellLicense:NOASSERTIONStargazers:509Issues:0Issues:0

T2I-CompBench

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Language:PythonLicense:MITStargazers:176Issues:0Issues:0

LayerDiffuse

Transparent Image Layer Diffusion using Latent Transparency

License:Apache-2.0Stargazers:1940Issues:0Issues:0

National_interest_waiver_waittime

USCIS Employment-based-2 national interest waiver wait time

License:MITStargazers:77Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5807Issues:0Issues:0
Language:PythonStargazers:132Issues:0Issues:0

MagicBrush

[NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".

Language:PythonLicense:NOASSERTIONStargazers:289Issues:0Issues:0

OMG-Seg

OMG-LLaVA and OMG-Seg codebase

Language:PythonLicense:NOASSERTIONStargazers:1174Issues:0Issues:0

sd-akashic

A compendium of informations regarding Stable Diffusion (SD)

License:UnlicenseStargazers:1619Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:4590Issues:0Issues:0

SyntheticData

Is synthetic data from generative models ready for image recognition?

Language:PythonLicense:Apache-2.0Stargazers:171Issues:0Issues:0

Super-CLEVR

Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"

Language:PythonLicense:NOASSERTIONStargazers:20Issues:0Issues:0

imagenet3d

ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

Language:PythonStargazers:13Issues:0Issues:0

DST3D

Official implementation of "Generating images with 3D annotations using diffusion models".

Language:PythonLicense:MITStargazers:56Issues:0Issues:0

objaverse-xl

🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!

Language:PythonLicense:Apache-2.0Stargazers:669Issues:0Issues:0

Bunny

A family of lightweight multimodal models.

Language:PythonLicense:Apache-2.0Stargazers:834Issues:0Issues:0

mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.

Language:PythonLicense:Apache-2.0Stargazers:5425Issues:0Issues:0

ADE20K

ADE20K Dataset

Language:Jupyter NotebookStargazers:304Issues:0Issues:0

MEBOW

Code for "MEBOW: Monocular Estimation of Body Orientation In the Wild", CVPR 2020

Language:PythonStargazers:57Issues:0Issues:0

omni3d

Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild"

Language:PythonLicense:NOASSERTIONStargazers:697Issues:0Issues:0

vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

Language:PythonLicense:MITStargazers:222Issues:0Issues:0

libcom

Image composition toolbox: everything you want to know about image composition or object insertion

Language:PythonLicense:Apache-2.0Stargazers:262Issues:0Issues:0

CityDreamer

The official implementation of "CityDreamer: Compositional Generative Model of Unbounded 3D Cities". (Xie et al., CVPR 2024)

Language:PythonLicense:NOASSERTIONStargazers:582Issues:0Issues:0

fiftyone

The open-source tool for building high-quality datasets and computer vision models

Language:PythonLicense:Apache-2.0Stargazers:7959Issues:0Issues:0
Language:PythonLicense:MITStargazers:4129Issues:0Issues:0
Language:JavaScriptStargazers:2186Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:18Issues:0Issues:0

CogVideo

Text-to-video generation. The repo for ICLR2023 paper "CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers"

Language:PythonLicense:Apache-2.0Stargazers:3576Issues:0Issues:0