Chinese Sentence Compression Dataset and the neural Chinese SC model. (EMNLP2021 long paper & oral)
PDF: https://aclanthology.org/2021.emnlp-main.33/
In the folder: ./Chinese SC dataset, there is a Chinese parallel SC dataset in the telecommunication domain.
Several personal privacy information and domain-relative sensitive information were masked by using special tokens. (More details can be found in our paper:))
And we will continue improving and expanding the Chinese SC dataset.
This is a neural Chinese SC model enhanced with a Self-Origanizing Map (SOM).
We will provide A BASIC PRELIMINARY VERSION of codes soon. (Well, it's not difficult to build this model:) If any problem, just email us or open an issue.)