Xin Cai's starred repositories
ml-engineering
Machine Learning Engineering Open Book
multimodal
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
multimodal-maestro
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
Awesome-TimeSeries-SpatioTemporal-LM-LLM
A professional list on Large (Language) Models and Foundation Models (LLM, LM, FM) for Time Series, Spatiotemporal, and Event Data.
Awesome-Segment-Anything
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).
groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
RegionCLIP
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
Segment-Any-Point-Cloud
[NeurIPS'23 Spotlight] Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
Awesome-SSL4TS
A professionally curated list of awesome resources (paper, code, data, etc.) on Self-Supervised Learning for Time Series (SSL4TS).
Segment-Anything-CLIP
Connecting segment-anything's output masks with the CLIP model; Awesome-Segment-Anything-Works
Awesome-Unsupervised-Object-Localization
Curated list of awesome works on unsupervised object localization in 2D images.
betrayed-by-captions
(ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
minimal-sqvae
A minimal Pytorch Implementation of Stochastically Quantized Variational AutoEncoder (SQ-VAE) by Sony