SOM-NCSCM

Chinese Sentence Compression Dataset and the neural Chinese SC model. (EMNLP2021 long paper & oral)

PDF: https://aclanthology.org/2021.emnlp-main.33/

Chinese Sentence Compression Dataset

In the folder: ./Chinese SC dataset, there is a Chinese parallel SC dataset in the telecommunication domain.

Several personal privacy information and domain-relative sensitive information were masked by using special tokens. (More details can be found in our paper:))

And we will continue improving and expanding the Chinese SC dataset.

The SOM-NCSCM.

This is a neural Chinese SC model enhanced with a Self-Origanizing Map (SOM).

We will provide A BASIC PRELIMINARY VERSION of codes soon. (Well, it's not difficult to build this model:) If any problem, just email us or open an issue.)

About

Chinese Sentence Compression Dataset and the neural Chinese SC model.

Apache License 2.0