This repository consist of indonesian translated quran with nested people entities up to two level datasets, and a supervised learning implementations (BiLSTM-CRF, IndoBERT, CRF) of nested people entity extraction on indonesian translated quran.
- Dataset file:
TA_dataset_raw_labeled_nested_4th
- Desc: The dataset is taken from the Tanzil Quran corpus which includes Juz 1 through Juz 6. The entity tag used in this research is PER (person), which represents people entities, and O for entities outside people entities. The format used to label people entities is the IOB format. Entity tag are manually labeled