Arking1995's starred repositories
LLaVA-1.6-ft
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
conceptual-captions
Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.
T2I-CompBench
[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
LayerDiffuse
Transparent Image Layer Diffusion using Latent Transparency
National_interest_waiver_waittime
USCIS Employment-based-2 national interest waiver wait time
MagicBrush
[NeurIPS'23] "MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing".
sd-akashic
A compendium of informations regarding Stable Diffusion (SD)
SyntheticData
Is synthetic data from generative models ready for image recognition?
Super-CLEVR
Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"
imagenet3d
ImageNet3D: Towards General-Purpose Object-Level 3D Understanding
objaverse-xl
🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!
vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
CityDreamer
The official implementation of "CityDreamer: Compositional Generative Model of Unbounded 3D Cities". (Xie et al., CVPR 2024)