NLP / FBK

NLP / FBK's repositories

Excitement-Open-Platform

Excitement Open Platform for Recognizing Textual Entailments

Language:Java86 34 103

E3C is a freely available multilingual corpus (Italian, English, French, Spanish, and Basque) of semantically annotated clinical narratives to allow for the linguistic analysis, benchmarking, and training of information extraction systems. It consists of two types of annotations: (i) clinical entities: pathologies, symptoms, procedures, body parts, etc., according to standard clinical taxonomies (i.e. SNOMED-CT, ICD-10); and (ii) temporal information and factuality: events, time expressions, and temporal relations according to the THYME standard. The corpus is organised into three layers, with different purposes. Layer 1: about 25K tokens per language with full manual annotation of clinical entities, temporal information and factuality, for benchmarkingand linguistic analysis. Layer 2: 50-100K tokens per language with semi-automatic annotations of clinical entities, to be used to train baseline systems. Layer 3: about 1M tokens per language of non-annotated medical documents to be exploited by semi-supervised approaches. Researchers can use the benchmark training and test splits of our corpus to develop and test their own models. We trained several deep learning based models and provide baselines using the benchmark. Both the corpus and the built models will be available through the ELG platform.

22 7 1

NLP / FBK

hltfbk

NLP / FBK's repositories

Excitement-Open-Platform

E3C-Corpus

EOP-1.2.1

CROMER

MT-EQuAl

EOP-1.2.3

EOP-1.1.2

EOP-1.1.3

EOP-1.1.4

Excitement-TDMLEDA

Excitement-Transduction-Layer

cas_access_example

EOP-1.0.2

EOP-1.1.1

EOP-1.2.0

lm-evaluation-harness