Arking1995

Arking1995's starred repositories

LLaVA-1.6-ft

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.02700

kubric

A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.

Language:Jupyter NotebookApache-2.0224700

conceptual-captions

Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.

Language:ShellNOASSERTION50900

T2I-CompBench

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Language:PythonMIT17600

LayerDiffuse

Transparent Image Layer Diffusion using Latent Transparency

Apache-2.0194000

National_interest_waiver_waittime

USCIS Employment-based-2 national interest waiver wait time

MIT7700

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION580700

UltraEdit

Language:Python13200

MagicBrush

[NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".

Language:PythonNOASSERTION28900

OMG-Seg

OMG-LLaVA and OMG-Seg codebase

Language:PythonNOASSERTION117400

sd-akashic

A compendium of informations regarding Stable Diffusion (SD)

Unlicense161900

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Language:PythonMIT459000

SyntheticData

Is synthetic data from generative models ready for image recognition?

Language:PythonApache-2.017100

Super-CLEVR

Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"

Language:PythonNOASSERTION2000

imagenet3d

ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

Language:Python1300

DST3D

Official implementation of "Generating images with 3D annotations using diffusion models".

Language:PythonMIT5600

objaverse-xl

🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!

Language:PythonApache-2.066900

Bunny

A family of lightweight multimodal models.

Language:PythonApache-2.083400

mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.

Language:PythonApache-2.0542500

ADE20K

ADE20K Dataset

Language:Jupyter Notebook30400

MEBOW

Code for "MEBOW: Monocular Estimation of Body Orientation In the Wild", CVPR 2020

Language:Python5700

omni3d

Code release for "Omni3D A Large Benchmark and Model for 3D Object Detection in the Wild"

Language:PythonNOASSERTION69700

vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

Language:PythonMIT22200

libcom

Image composition toolbox: everything you want to know about image composition or object insertion

Language:PythonApache-2.026200

CityDreamer

The official implementation of "CityDreamer: Compositional Generative Model of Unbounded 3D Cities". (Xie et al., CVPR 2024)

Language:PythonNOASSERTION58200

fiftyone

The open-source tool for building high-quality datasets and computer vision models

Language:PythonApache-2.0795900

TripoSR

Language:PythonMIT412900

nerfies.github.io

Language:JavaScript218600

object-edit

Language:PythonNOASSERTION1800

CogVideo

Text-to-video generation. The repo for ICLR2023 paper "CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers"

Language:PythonApache-2.0357600