The extracted urls contains markdown
PiotrCzapla opened this issue · comments
If you have a look at nlpprogress.json the URLs sometimes contain part of the markdown it.
Have a look here: https://github.com/paperswithcode/sota-extractor/blob/master/data/tasks/nlpprogress.json#L326
or here:
https://github.com/paperswithcode/sota-extractor/blob/master/data/tasks/nlpprogress.json#L718