lk-geimfari / mimesis

Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

Home Page:https://mimesis.name

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Swedish names contains illegal letter

ullenius opened this issue · comments

Bug report

The list of Swedish names contains the letter "ō" which is not present in Scandinavian alphabets and does not have a latin-1 equivalent.

According to Wikipedia the letter is used in the Livonian language, Samogitian and some non-european languages. It was also used in Latvian prior to 1946 ( source)

What's wrong

These names ought to be removed:

mimesis/mimesis/data/sv$ grep ō *.json
person.json:      "Hrōdhvald",
person.json:      "Rōdhger",
person.json:      "Rōdhulf",
person.json:      "Rōdhvald"

How is that should be

  1. These are not Scandinavian names. This is possibly a typo where the author meant to spell the name using 'ö' which is a similar letter in the Swedish alphabet.

  2. These letters cause problems when converting utf-8 generated data from Mimesis to latin-1. Legacy systems and formats tend to use latin-1 in Scandinavia.

System information

All systems affected