Source code for our paper: Aggregating and Predicting Sequence Labels from Crowd Annotations An Thanh Nguyen, Byron C. Wallace, Junyi Jessy Li, Ani Nenkova and Matthew Lease Association for Computational Linguistics (ACL) The crowd NER dataset is from Rodrigues et al. (2014): http://www.fprodrigues.com/software/crf-ma-sequence-labeling-with-multiple-annotators/ For LSTM-Crowd, our implementation extends Lample et al. (2016): https://github.com/glample/tagger Data for the Biomedical IE task: https://github.com/yinfeiy/PICO-data ------------------------------------------------------------------------- Python version = 2.7 Installing required packages: pip install -r requirements.txt To run the experiment on NER Task 1, extract the Rodrigues et al. data in task1.zip then execute: python hmm.py -i task1/val/ -o output.txt to run on the validation set or python hmm.py -i task1/test/ -o output.txt to run on the test set. Execute python hmm.py --help for more options. ------------------------------------------------------------------------- References: Rodrigues, Filipe, Francisco Pereira, and Bernardete Ribeiro. "Sequence labeling with multiple annotators." Machine Learning 95.2 (2014): 165-181. Lample, Guillaume, et al. "Neural architectures for named entity recognition." arXiv preprint arXiv:1603.01360 (2016).