TotalVariation

Xin Cai's starred repositories

ml-engineering

Machine Learning Engineering Open Book

Language:PythonCC-BY-SA-4.012051 116 30

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT7947 76 226

multimodal

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

Language:PythonBSD-3-Clause1500 21 40

multimodal-maestro

streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL

Language:PythonApache-2.01308 18 13

CVinW_Readings

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''

1222 38 6

Awesome-CLIP

Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).

1152 19 15

Awesome-TimeSeries-SpatioTemporal-LM-LLM

A professional list on Large (Language) Models and Foundation Models (LLM, LM, FM) for Time Series, Spatiotemporal, and Event Data.

944 29 3

Awesome-Segment-Anything

This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).

MIT865 21 7

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:Python797 31 78

RegionCLIP

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"

Language:PythonApache-2.0727 10 102

Segment-Any-Point-Cloud

[NeurIPS'23 Spotlight] Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

Language:Python575 26 13

GeoChat

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

Language:Python473 11 54

Awesome-SSL4TS

A professionally curated list of awesome resources (paper, code, data, etc.) on Self-Supervised Learning for Time Series (SSL4TS).

293 11 4

MQ-Det

Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)

Language:PythonApache-2.0276 2 61

Uni-Perceiver

Language:PythonApache-2.0269 9 18

OV-DETR

[Under preparation] Code repo for "Open-Vocabulary DETR with Conditional Matching" (ECCV 2022)

Language:Python213 5 26

CAE

This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"

Language:Python193 5 17

COMM

Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

MIT189 20 5

Segment-Anything-CLIP

Connecting segment-anything's output masks with the CLIP model; Awesome-Segment-Anything-Works

Language:Jupyter NotebookApache-2.0182 4 3

CORA

A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023

Language:PythonApache-2.0180 7 30

SAMFeat

The official implementation of “Segment Anything Model is a Good Teacher for Local Feature Learning”.

Language:PythonMIT106 5 4

SegCLIP

PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"

Language:Python85 10 5

Awesome-Unsupervised-Object-Localization

Curated list of awesome works on unsupervised object localization in 2D images.

Apache-2.067 4 1

MaskCLIP

Code Release for MaskCLIP (ICML 2023)

Language:PythonNOASSERTION58 3 7

betrayed-by-captions

(ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

Language:Jupyter Notebook45 7 8

CounTX

Includes FSC-147-D and the code for training and testing the CounTX model from the paper Open-world Text-specified Object Counting.

Language:Jupyter NotebookMIT35 2 9

minimal-sqvae

A minimal Pytorch Implementation of Stochastically Quantized Variational AutoEncoder (SQ-VAE) by Sony

Language:PythonMIT29 20

MILA

Memory-Based Instance-Level Adaptation for Cross-Domain Object Detection

Language:Python14 2 3

MatchAndDeform

Language:Jupyter Notebook10 20

TotalVariation.github.io

Language:HTMLMIT1 10