attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'{{snd}}' should resolve to '–' or '-'

dnk8n opened this issue · comments

In leaving a comment, #130 (comment) I noticed a bug wrt the following line:

According to:

  • Wikipedia rendered: (11 August 1848 – 27 June 1934) was an
  • Wikipedia source: (11 August 1848 – 27 June 1934) was an
  • dump XML: (11 August 1848{{snd}}27 June 1934) was an
  • wikiextractor output (erroneous): (11 August 184827 June 1934) was an