kevin-ssy

Shuyang Sun's starred repositories

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.046208 305 658

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.014509 115 380

mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Language:PythonNOASSERTION7077 58 187

InternLM

Official release of InternLM2.5 base and chat models. 1M context support

Language:PythonApache-2.06065 54 321

ConvNeXt

Code release for ConvNeXt model

Language:PythonMIT5678 33 128

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT3924 114 74

ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)

Language:PythonApache-2.02811 20 276

mmdeploy

OpenMMLab Model Deployment Framework

Language:PythonApache-2.02661 37 1576

MotionCtrl

Official Code for MotionCtrl [SIGGRAPH 2024]

Language:PythonApache-2.01232 50 31

OMG-Seg

OMG-LLaVA and OMG-Seg codebase

Language:PythonNOASSERTION1181 23 28

DragDiffusion

[CVPR2024, Highlight] Official code for DragDiffusion

Language:PythonApache-2.01114 26 63

FateZero

[ICCV 2023 Oral] "FateZero: Fusing Attentions for Zero-shot Text-based Video Editing"

Language:Jupyter NotebookMIT1084 14 33

Bunny

A family of lightweight multimodal models.

Language:PythonApache-2.0851 19 106

ScaleCrafter

[ICLR 2024 Spotlight] Official implementation of ScaleCrafter for higher-resolution visual generation at inference time.

Language:Python475 9 29

PointLLM

[ECCV 2024 Oral] PointLLM: Empowering Large Language Models to Understand Point Clouds

Language:Python469 12 32

TokenCut

(CVPR 2022) Pytorch implementation of "Self-supervised transformers for unsupervised object discovery using normalized cut"

Language:Jupyter NotebookMIT295 7 15

fc-clip

[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Language:PythonApache-2.0269 16 33

DDQ

Dense Distinct Query for End-to-End Object Detection (CVPR2023)

Language:PythonApache-2.0243 9 21

SyntheticData

Is synthetic data from generative models ready for image recognition?

Language:PythonApache-2.0171 13 9

TransMix

[CVPR 2022] This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

Language:PythonApache-2.0158 11 19

UniHSI

[ICLR 2024 Spotlight] Unified Human-Scene Interaction via Prompted Chain-of-Contacts

Language:Python149 10 13

PartImageNet

Introduction and scripts for the paper "PartImageNet: A Large, High-Quality Dataset of Parts" (Ju He, Shuo Yang, Shaokang Yang, Adam Kortylewski, Xiaoding Yuan, Jie-Neng Chen, Shuai Liu, Cheng Yang, Alan Yuille).

114 5 16

AVION

Code release for "Training a Large Video Model on a Single Machine in a Day"

Language:PythonMIT105 1 10

leopart

Language:PythonMIT95 2 6

Training-Data-Synthesis

[ICLR 2024] Real-Fake: Effective Training Data Synthesis Through Distribution Matching

Language:PythonMIT69 3 4

kmax-deeplab

a PyTorch re-implementation of ECCV 2022 paper based on Detectron2: k-means mask Transformer.

Language:PythonApache-2.065 7 3

KDEP

(CVPR2022) Official PyTorch Implementation of KDEP. Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability

Language:PythonApache-2.062 2 2

RAG-Driver

A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-to-end driving

Language:PythonApache-2.058 8 5

IMProv

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

Language:Python57 5 5

Oxford_HIC

A large-scale humour-oriented image text dataset

Language:PythonMIT8 20