amazon-science / multiatis

Data and code for the paper "End-to-End Slot Alignment and Recognition for Cross-Lingual NLU" (Accepted to EMNLP 2020)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MultiAtis++ Corpus

Description

The ATIS (Air Travel Information Services) collection was developed to support the research and development of speech understanding systems [1]. The original English data includes intent and slot annotations, and was later extended to Hindi and Turkish [2]. MultiATIS++ futher extends ATIS to 6 more languages, and hence, covers a total of 9 languages, that is, English, Spanish, German, French, Portuguese, Chinese, Japanese, Hindi and Turkish. These locales belong to a diverse set of language families- Indo-European, Sino-Tibetan, Japonic and Altaic.

MultiATIS++ corpus has been outsourced to foster further research in the domain of multilingual/cross-lingual natural language understanding.

For more details, please check the paper: Xu, W., Haider, B. and Mansour, S., 2020. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. arXiv preprint arXiv:2004.14353 (https://arxiv.org/abs/2004.14353)

Accessing MultiAtis++

To obtain a copy of MutliAtis++ data, please visit: https://catalog.ldc.upenn.edu/LDC2021T04

Please send your queries/comments to multiatis@amazon.com.

Citation

Please cite [3] when referring to the MultiATIS++ dataset.

Soft-Align Implementation

Implementation of the soft-align method introduced in [3] will be available here, soon.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

References

[1] LDC93S5 ATIS2, LDC94S19 ATIS3 Training Data, LDC95S26 ATIS3 Test Data

[2] Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck. (Almost) Zero-Shot Cross-Lingual Spoken Language Understanding. IEEE ICASSP 2018.

[3] Weijia Xu, Batool Haider, Saab Mansour. 2020. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU. arXiv preprint arXiv:2004.14353.

About

Data and code for the paper "End-to-End Slot Alignment and Recognition for Cross-Lingual NLU" (Accepted to EMNLP 2020)

License:Apache License 2.0


Languages

Language:Python 91.3%Language:Perl 8.7%