laurieburchell / open-lid-dataset

Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add Abkhaz

ZJaume opened this issue · comments

I found this corpus coming from the Abkhazian National Corpus and Common Voice. So probably it won't have any language pollution and can be used for training. I asked just in case, but no answer yet.

Thank you so much! I will add this to my list of languages to add to OpenLID 2.0 :)