There are 2 repositories under zero-shot-classification topic.
An open source implementation of CLIP.
A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, Florence-2, PaliGemma 2, and Qwen2.5VL.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Diffusion Classifier leverages pretrained diffusion models to perform zero-shot classification without additional training
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
Cybertron: the home planet of the Transformers in Go
official code of “OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding”
Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)
PyTorch code for MUST
Multi-Aspect Vision Language Pretraining - CVPR2024
Official PyTorch Implementation of MSDN (CVPR'22)
[TPAMI 2023] Generative Multi-Label Zero-Shot Learning
[ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"
[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"
Evaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics include Zero-shot accuracy, Linear Probe, Image retrieval, and KNN accuracy.
Implementation of Z-BERT-A: a zero-shot pipeline for unknown intent detection.
Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models to do ZSC. Hence, can be lightweight + supports more languages without trading-off accuracy. (Super simple, a 10th-grader could totally write this but since no 10th-grader did, I did) - Prithivi Da
Low-latency ONNX and TensorRT based zero-shot classification and detection with contrastive language-image pre-training based prompts
Codes for the experiments in our EMNLP 2021 paper "Open Aspect Target Sentiment Classification with Natural Language Prompts"
[CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"
Airflow Pipeline for Machine Learning
Official PyTorch Code for "ATPrompt: Textual Prompt Learning with Embedded Attributes"
NeurIPS 2024 Track on Datasets and Benchmarks (Spotlight)
A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch
Actor-agnostic Multi-label Action Recognition with Multi-modal Query [ICCVW '23]
Deep Learning for Computer Vision 深度學習於電腦視覺 by Frank Wang 王鈺強
[ACL'23 Findings] This is the code repo for our ACL'23 Findings paper "ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval".
Perform topic classification on news articles in several limited-labeled data regimes.
Code for EMNLP2019 paper : "Benchmarking zero-shot text classification: datasets, evaluation and entailment approach"
[ICML 2024] Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
A toolkit for research on multimodal representation learning
GPT-4o (with Vision) module for use with Autodistill.
PyTorch implementation of 'CLIP' (Radford et al., 2021) from scratch and training it on Flickr8k + Flickr30k
The source code used for paper "PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training", published in EMNLP 2023.