HanleiZhang / TEXTOIR

TEXTOIR is a flexible toolkit for open intent detection and discovery. (ACL 2021)

Home Page:https://github.com/thuiar/TEXTOIR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TEXT Open Intent Recognition (TEXTOIR)

TEXTOIR is the first high-quality Text Open Intent Recognition platform. This repo contains a convenient toolkit with extensible interfaces, integrating a series of algorithms of two tasks (open intent detection and open intent discovery). We also release the pipeline framework and the visualized platform in the repo TEXTOIR-DEMO.

If you are interested in this work, and want to use the codes in this repo, please star or fork this repo, and cite our ACL 2021 demo paper:

@inproceedings{zhang-etal-2021-textoir,
    title = "{TEXTOIR}: An Integrated and Visualized Platform for Text Open Intent Recognition",
    author = "Zhang, Hanlei  and
      Li, Xiaoteng  and
      Xu, Hua  and
      Zhang, Panpan  and
      Zhao, Kang  and
      Gao, Kai",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations",
    year = "2021",
    pages = "167--174",
}

Introduction

TEXTOIR aims to provide a convenience toolkit for researchers to reproduce the related text open classification and clustering methods. It contains two tasks, which are defined as open intent detection and open intent discovery. Open intent detection aims to identify n-class known intents, and detect one-class open intent. Open intent discovery aims to leverage limited prior knowledge of known intents to find fine-grained known and open intent-wise clusters. Related papers and codes are collected in our previous released reading list.

Open Intent Recognition:
Example

We strongly recommend you to use our TEXTOIR toolkit, which has standard and unified interfaces (especially data setting) to obtain fair and persuable results on benchmark intent datasets!

Benchmark Datasets

Integrated Models

Open Intent Detection

Open Intent Discovery

(* denotes the CV model replaced with the BERT backbone)

Quick Start

  1. Use anaconda to create Python (version >= 3.6) environment
conda create --name textoir python=3.6
conda activate textoir
  1. Install PyTorch (Cuda version 11.2)
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch -c conda-forge  
  1. Clone the TEXTOIR repository, and choose the task (Take open intent detection as an example).
git clone git@github.com:HanleiZhang/TEXTOIR.git
cd TEXTOIR
cd open_intent_detection
  1. Install related environmental dependencies
pip install -r requirements.txt
  1. Run examples (Take ADB as an example)
sh examples/run_ADB.sh

Extensibility and Reliability

Extensibility

This toolkit is extensible and supports adding new methods, datasets, configurations, backbones, dataloaders, losses conveniently. More detailed information can be seen in the directory open_intent_detection and open_intent_discovery respectively.

Reliability

The codes in this repo have been confirmed and are reliable. The experimental results are close to the reported ones in our AAAI 2021 papers Discovering New Intents with DeepAligned Clustering and Deep Open Intent Classification with Adaptive Decision Boundary. Note that the results of some methods may fluctuate in a small range due to the selected random seeds, hyper-parameters, optimizers, etc. The final results are the average of 10 random seeds to reduce the influence of different selected known classes.

Acknowledgements

Toolkit Contributors: Hanlei Zhang, Ting-En Lin, Qianrui Zhou, Shaojie Zhao, Xin Wang, Huisheng Mao.

Supervisor: Hua Xu.

Bugs or questions?

If you have any questions, feel free to open issues and pull request. Please illustrate your problems as detailed as possible. If you want to integrate your method in our repo, please contact us (zhang-hl20@mails.tsinghua.edu.cn).

About

TEXTOIR is a flexible toolkit for open intent detection and discovery. (ACL 2021)

https://github.com/thuiar/TEXTOIR

License:GNU General Public License v3.0


Languages

Language:Python 69.3%Language:C 18.0%Language:C++ 7.1%Language:Shell 2.8%Language:Cython 2.7%Language:Makefile 0.0%