facebookresearch / LASER

Language-Agnostic SEntence Representations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add 2-letter codes to the `laser_encoders` language list

avidale opened this issue · comments

Currently, in laser_encoders, we use either 3-letter codes (like eng) or 8-letter codes (like eng_Latn).
This is fine, but users often prefer 2-letter codes (like "en").

This approach is not very sustainable, because there are many more languages in the world than possible 2-letter codes. However, people still use two-letter codes frequently, so I think it would be nice to support them (and of course to check that we raise meaningful errors in the cases when one two-letter code corresponds to several languoids).