CUNY-CL / wikipron

Massively multilingual pronunciation mining

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some language codes not recognized by iso639.Language.match()

sonofthomp opened this issue · comments

Running codes.py yielded the following error:

(base) gabrielthompson@Gabriels-MBP-2 lib % python codes.py 
codes.py WARNING: WikiPron resolves the key 'ain' to 'Ainu (Japan)' listed as 'Ainu' on Wiktionary
codes.py WARNING: WikiPron resolves the key 'rup' to 'Macedo-Romanian' listed as 'Aromanian' on Wiktionary
codes.py WARNING: WikiPron resolves the key 'bjb' to 'Banggarla' listed as 'Barngarla' on Wiktionary
Traceback (most recent call last):
  File "/Users/gabrielthompson/Desktop/Coding/research/wikipron3/data/scrape/lib/codes.py", line 215, in <module>
    main()
  File "/Users/gabrielthompson/Desktop/Coding/research/wikipron3/data/scrape/lib/codes.py", line 177, in main
    iso639_lang = iso639.Language.match(wiktionary_code)
  File "/Users/gabrielthompson/anaconda3/lib/python3.10/site-packages/iso639/language.py", line 120, in match
    return _get_language(user_input, query_order)
  File "/Users/gabrielthompson/anaconda3/lib/python3.10/site-packages/iso639/language.py", line 189, in _get_language
    raise LanguageNotFoundError(
iso639.language.LanguageNotFoundError: 'gmw-cfr' isn't an ISO language code or name

For whatever reason, the iso639 module isn't recognizing some of the language codes from the wiktionary API. Someone should look into why this is, or maybe omit languages that don't have valid language codes.

I hadn't seen that fatal exception before. I think we should probably catch it and convert it to a warning. What do you think?

Giving a warning sounds like a good idea. In place of iso639_lang = iso639.Language.match(wiktionary_code), I'm thinking something like:

try:
    iso639_lang = iso639.Language.match(wiktionary_code)
except iso639.language.LanguageNotFoundError:
    logging.warning(
        "Could not find language with code %s", wiktionary_code
    )

... so that in the case of gmw-cfr, the following is outputted:

codes.py WARNING: Could not find language with code gmw-cfr

codes.py WARNING: Could not find language with code gmw-cfr

This proposal LGTM.

Closed in #499, I believe.