About
This repo contains a curated meta-index of NLP datasets and models for Eastern European languages. It originally started as a summer school project at EEML 2021 (Eastern European Machine Learning Summer School) (hence the scope), self-organized by a group of participants. You can read more details about this initial summer school project here.
We hope this broad index of NLP resources for Eastern European languages could help:
- facilitate the synergy of Eastern European NLP research communities;
- highlight the underrepresented languages of Eastern Europe;
- understand cross-cultural and cross-linguistic differences;
- decrease the digital language divide.
Initially, EENLP was biased towards datasets for semantic NLP tasks such as sentiment analysis, NLI, word sense disambiguation, etc. However, we are expanding and improving this index further, so feel free to contribute new relevant resources. We are also happy to hear your feedback and suggestions via issues or at altsoph@gmail.com.
Resources
The datasets
Browse the datasets index or select your language of interest:
๐ฆ๐ฑ ๐ฆ๐ฒ ๐ง๐พ ๐ง๐ฆ ๐ง๐ฌ ๐ญ๐ท ๐จ๐ฟ ๐ช๐ช ๐ฌ๐ช ๐ญ๐บ ๐ฐ๐ฟ ๐ฑ๐ป ๐ฑ๐น ๐ฒ๐ฐ ๐ฒ๐ฉ ๐ฒ๐ช ๐ต๐ฑ ๐ท๐ด ๐ท๐บ ๐ท๐ธ ๐ธ๐ฐ ๐ธ๐ฎ ๐บ๐ฆ
The models
Browse the models index or select your language of interest:
๐ฆ๐ฑ ๐ฆ๐ฒ ๐ง๐พ ๐ง๐ฆ ๐ง๐ฌ ๐ญ๐ท ๐จ๐ฟ ๐ช๐ช ๐ฌ๐ช ๐ญ๐บ ๐ฐ๐ฟ ๐ฑ๐ป ๐ฑ๐น ๐ฒ๐ฐ ๐ฒ๐ฉ ๐ฒ๐ช ๐ต๐ฑ ๐ท๐ด ๐ท๐บ ๐ท๐ธ ๐ธ๐ฐ ๐ธ๐ฎ ๐บ๐ฆ
Contribution
Feel free to contribute. The details are in our contributing guidelines.
Citation
@misc{tikhonov2021eenlp,
title={EENLP: Cross-lingual Eastern European NLP Index},
author={Alexey Tikhonov and Alex Malkhasov and Andrey Manoshin and George Dima and Rรฉka Cserhรกti and Md. Sadek Hossain Asif and Matt Sรกrdi},
year={2021},
eprint={2108.02605},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Licensing
This index is licensed under Apache-2.0 License. However, please, note that each resource has individual licensing properties.
Development
This is mostly internal documentation for us.