kermitt2 / delft

a Deep Learning Framework for Text https://delft.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some links are broken in the documentation

lfoppiano opened this issue · comments

I was reading the readme to try to test the nerTagger, and I found that some links in these paragraphs are broken (or at least I cannot access them).

I fixed what I could, but I'm reporting the rest here.

I did not find a better way to just reference the source:

For re-training a model, the CoNLL-2003 NER dataset (eng.train, eng.testa, eng.testb) must be present under data/sequenceLabelling/CoNLL-2003/ in IOB2 tagging scheme (look here for instance ;) and here. The CONLL 2003 dataset (English) is the default dataset and English is the default language, but you can also indicate it explicitly as parameter with --dataset-type conll2003 and specifying explicitly the language --lang en.
For re-training, the assembled Ontonotes datasets following CoNLL-2012 must be available and converted into IOB2 tagging scheme, see [here](https://github.com/kermitt2/delft/tree/master/utilities) for more details. To train and evaluate following the traditional approach (training with the train set without validation set, and evaluating on test set), use:

Thanks you ! this is fixed with 80c4595