LightChen233 / CLIPText

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CLIPText: A New Paradigm for Zero-shot Text Classification

License: MIT

This repository contains the PyTorch implementation and the data of the paper: CLIPText: A New Paradigm for Zero-shot Text Classification. Libo Qin, Weiyun Wang, Qiguang Chen, Wanxiang Che. ACL2023 Findings.[PDF] .

This code has been written using PyTorch >= 2.0. If you find this code useful for your research, please consider citing the following paper:

@inproceedings{qin-etal-2023-cliptext,
    title = "{CLIPT}ext: A New Paradigm for Zero-shot Text Classification",
    author = "Qin, Libo  and
      Wang, Weiyun  and
      Chen, Qiguang  and
      Che, Wanxiang",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.69",
    doi = "10.18653/v1/2023.findings-acl.69",
    pages = "1077--1088",
}

Network Architecture

Prerequisites

This codebase was developed and tested with the following settings:

-- scikit-learn==1.2.1
-- numpy==1.24.2
-- pytorch==2.0.1
-- torchvision==0.15.2
-- tqdm==4.64.1
-- clip==1.0
-- transformers==4.27.1
-- regex==2022.10.31
-- ftfy==6.1.1
-- pillow==9.4.0
  • Please Attention: Different version of clip might lead to different results, we hope that you can install clip by this command:
pip install git+https://github.com/openai/CLIP.git@a9b1bf5

How to run it

The script main.py acts as a main function to the project, you can run the experiments by the following commands:

  • To replicate our CLIPTextC results on the test set:
python main.py --test --dataset [dataset_name]
  • To replicate our Prompt-CLIPTextC results on the test set:
python main.py --test --text_prompt --dataset [dataset_name]
  • To replicate our ensemble results on the test set:
python main.py --test [--text_prompt] --ensemble_size 2 --dataset [dataset_name]

where [dataset_name] in ['emotion', 'situation', 'topic', 'agnews', 'snips', 'trec', 'subj']

Model Performance

About

License:MIT License


Languages

Language:Python 100.0%