This repository contains the datasets from the paper "Stress Test Evaluation of Biomedical Word Embeddings".
- All datasets are in IOB2 tag format.
- The acronym ST on the folders indicates that it is a Stress Test Set.
- We include 2 datasets for chemical NER (BC4CHEMD, BC5CDR-chem) and disease NER (NCBI-disease, BC5CDR-disease).
If you find this repository useful for your research, please consider citing our paper:
@inproceedings{araujo-etal-2021-stress,
title = "Stress Test Evaluation of Biomedical Word Embeddings",
author = "Araujo, Vladimir and
Carvallo, Andr{\'e}s and
Aspillaga, Carlos and
Thorne, Camilo and
Parra, Denis",
booktitle = "Proceedings of the 20th Workshop on Biomedical Language Processing",
month = jun,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.bionlp-1.13",
pages = "119--125",
}