vtodream / CAISandSMP

CAISandSMP datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CAIS and SMP

Directory

├── CAIS                                                    #CAIS datasets
│   ├── dataLoder                                           #Different forms of CAIS datasets loading method
│   │   ├── GlobalPointerDataloder.py
│   │   ├── mergeCAISDataloder.py
│   │   └── sourceDataloder.py
│   ├── GlobalPointerCAIS                                   #The location of the original datasets entity to represent
│   │   ├── dev.json
│   │   ├── test.json
│   │   └── train.json
│   ├── mergeCAIS                                           #Fill in the intention of the original datasets with the slot and merge
│   │   ├── dev.txt
│   │   ├── test.txt
│   │   └── train.txt
│   ├── source                                              #original datasets
│   │   ├── test
│   │   │   ├── ch.test
│   │   │   └── ch.test.intent
│   │   ├── train
│   │   │   ├── ch.train
│   │   │   └── ch.train.intent
│   │   └── valid
│   │       ├── ch.valid
│   │       └── ch.valid.intent
│   └── SourceToGlobalPointer.py                             #datasets conversion program
├── README.md
└── SMP
    ├── GlobalPointer
    │   ├── GlobalPointerSMP2019                             #The location of the original datasets entity to represent
    │   │   ├── dev.json
    │   │   ├── test.json
    │   │   └── train.json
    │   └── GlobalPointerSMP2020
    │       ├── dev.json
    │       ├── test.json
    │       └── train.json
    ├── GlobalPointerToMerge.py                              #datasets conversion program
    ├── merge                                                #Fill in the intention of the original datasets with the slot and merge
    │   ├── 2019mergeSMP
    │   │   ├── dev.txt
    │   │   ├── test.txt
    │   │   └── train.txt
    │   └── 2020mergeSMP
    │       ├── dev.txt
    │       ├── test.txt
    │       └── train.txt
    ├── source                                               #original datasets
    │   ├── 2019train.json
    │   └── 2020train.json
    └── sourceToGlobalPointer.py                             #datasets conversion program

Introduction:

  • CAIS:

CAIS Origin from the paper CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding,CAIS dataset includes 7,995 training, 994 validation and 1024 test utterances.Original data can be downloaded from Github

CAIS baseline

model Slot(F1) Intent(Acc) Overall(Acc)
Slot-Gated 82.21 93.87 80.43
SF-ID Network 86.34 94.66 84.09
CM-Net 86.16 94.56 -
Stack-Propagation 87.65 94.57 84.68
Multi-Level Word Adapter 88.57 94.66 85.47
  • SMP:

SMP comes from8th National Social Media Processing Conference (SMP 2019)Ninth National Social Media Processing Conference (SMP 2020))Evaluate Chinese Dialogue Technology (ECDT) task.Because the competition has ended, it is divided into the training datasets。According to the number of intentions, the original training set is divided into a training set at the ratio of 8: 1: 1, and the verification set and test set。SMP2019 dataset includes 2,053 training, 256 validation and 270 test utterances.SMP2020 dataset includes 4,011 training, 493 validation and 520 test utterances.

SMP2019 baseline

model Slot(F1) Intent(Acc) Overall(Acc)
Slot-Gated 62.94 91.11 57.03
SF-ID Network 71.59 94.07 63.33
CM-Net - - -
Stack-Propagation 78.91 94.44 72.59
Multi-Level Word Adapter 73.60 93.70 70.00

SMP2020 baseline

model Slot(F1) Intent(Acc) Overall(Acc)
Slot-Gated 70.45 91.15 65.65
SF-ID Network 78.47 92.69 71.34
CM-Net - - -
Stack-Propagation 82.50 94.03 76.15
Multi-Level Word Adapter 84.32 96.34 80.76

The detailed code of our paper will be uploaded soon.

About

CAISandSMP datasets


Languages

Language:Python 100.0%