Some links are broken in the documentation
lfoppiano opened this issue · comments
Luca Foppiano commented
I was reading the readme to try to test the nerTagger, and I found that some links in these paragraphs are broken (or at least I cannot access them).
I fixed what I could, but I'm reporting the rest here.
I did not find a better way to just reference the source:
For re-training a model, the CoNLL-2003 NER dataset (eng.train, eng.testa, eng.testb) must be present under data/sequenceLabelling/CoNLL-2003/ in IOB2 tagging scheme (look here for instance ;) and here. The CONLL 2003 dataset (English) is the default dataset and English is the default language, but you can also indicate it explicitly as parameter with --dataset-type conll2003 and specifying explicitly the language --lang en.
For re-training, the assembled Ontonotes datasets following CoNLL-2012 must be available and converted into IOB2 tagging scheme, see [here](https://github.com/kermitt2/delft/tree/master/utilities) for more details. To train and evaluate following the traditional approach (training with the train set without validation set, and evaluating on test set), use:
Patrice Lopez commented
Thanks you ! this is fixed with 80c4595