YTEP-ZHI

Jiazhi Yang's starred repositories

InstantID

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

Language:PythonApache-2.09703 122 168

StableCascade

Official Code for Stable Cascade

Language:Jupyter NotebookMIT6279 57 111

alphageometry

Language:PythonApache-2.03634 49 94

Vim

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Language:Python2009 27 61

RPG-DiffusionMaster

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Language:Jupyter NotebookAGPL-3.01420 23 35

RAG-Survey

1255 30 15

onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Language:Python1171 36 267

hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

Apache-2.01016 34 13

OpenLRM

An open-source impl. of Large Reconstruction Models

Language:PythonApache-2.0722 26 34

Awesome-Robotics-Foundation-Models

MIT655 20 2

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models

Language:PythonNOASSERTION609 20 4

Bunny

A family of lightweight multimodal models.

Language:PythonApache-2.0459 11 27

gaussian_splatting_notes

A detailed formulae explanation on gaussian splatting

397 220

LaVIT

LaVIT: Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

Language:Jupyter NotebookNOASSERTION330 15 17

local-attention

An implementation of local windowed attention for language modeling

Language:PythonMIT325 5 16

FreeNoise

[ICLR 2024] Code for FreeNoise based on VideoCrafter

Language:PythonApache-2.0314 6 13

DCNv4

[CVPR 2024] Deformable Convolution v4

Language:PythonMIT311 3 41

InfoBatch

Lossless Training Speed Up by Unbiased Dynamic Data Pruning

Language:Python265 6 8

VIRL

Code for V-IRL: Grounding Virtual Intelligence in Real Life

Language:Python238 12 2

particle-sfm

ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild. ECCV 2022.

Language:C++GPL-3.0226 15 15

MMVP

Language:Python188 8 14

Forge_VFM4AD

A comprehensive survey of forging vision foundation models for autonomous driving, including challenges, methodologies, and opportunities.

181 7 1

JourneyDB

127 6 10

iclr2024-openreview-submissions

ICLR 2024 OpenReivew Submission Data

Language:Python122 1 1

ZeroShape

Code repository for "ZeroShape: Regression-based Zero-shot Shape Reconstruction".

Language:Python100 12 1

clip_prs

official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"

Language:Jupyter NotebookNOASSERTION71 3 1

EvalCrafter

[CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Language:Jupyter Notebook65 1 9

360DVD

[CVPR2024] 360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model

62 5 3

POP3D

Source code for NeurIPS paper "POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images"

Language:Python58 4 7

Optix

Memory Efficient Training Framework for Large Video Generation Model

Language:PythonApache-2.017 4 1