altsoph / EENLP

The broad index of NLP resources for Eastern European languages. The best EEML 2021 project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About

This repo contains a curated meta-index of NLP datasets and models for Eastern European languages. It originally started as a summer school project at EEML 2021 (Eastern European Machine Learning Summer School) (hence the scope), self-organized by a group of participants. You can read more details about this initial summer school project here.

We hope this broad index of NLP resources for Eastern European languages could help:

  • facilitate the synergy of Eastern European NLP research communities;
  • highlight the underrepresented languages of Eastern Europe;
  • understand cross-cultural and cross-linguistic differences;
  • decrease the digital language divide.

Initially, EENLP was biased towards datasets for semantic NLP tasks such as sentiment analysis, NLI, word sense disambiguation, etc. However, we are expanding and improving this index further, so feel free to contribute new relevant resources. We are also happy to hear your feedback and suggestions via issues or at altsoph@gmail.com.

Resources

The datasets

Browse the datasets index or select your language of interest:

๐Ÿ‡ฆ๐Ÿ‡ฑ ๐Ÿ‡ฆ๐Ÿ‡ฒ ๐Ÿ‡ง๐Ÿ‡พ ๐Ÿ‡ง๐Ÿ‡ฆ ๐Ÿ‡ง๐Ÿ‡ฌ ๐Ÿ‡ญ๐Ÿ‡ท ๐Ÿ‡จ๐Ÿ‡ฟ ๐Ÿ‡ช๐Ÿ‡ช ๐Ÿ‡ฌ๐Ÿ‡ช ๐Ÿ‡ญ๐Ÿ‡บ ๐Ÿ‡ฐ๐Ÿ‡ฟ ๐Ÿ‡ฑ๐Ÿ‡ป ๐Ÿ‡ฑ๐Ÿ‡น ๐Ÿ‡ฒ๐Ÿ‡ฐ ๐Ÿ‡ฒ๐Ÿ‡ฉ ๐Ÿ‡ฒ๐Ÿ‡ช ๐Ÿ‡ต๐Ÿ‡ฑ ๐Ÿ‡ท๐Ÿ‡ด ๐Ÿ‡ท๐Ÿ‡บ ๐Ÿ‡ท๐Ÿ‡ธ ๐Ÿ‡ธ๐Ÿ‡ฐ ๐Ÿ‡ธ๐Ÿ‡ฎ ๐Ÿ‡บ๐Ÿ‡ฆ

The models

Browse the models index or select your language of interest:

๐Ÿ‡ฆ๐Ÿ‡ฑ ๐Ÿ‡ฆ๐Ÿ‡ฒ ๐Ÿ‡ง๐Ÿ‡พ ๐Ÿ‡ง๐Ÿ‡ฆ ๐Ÿ‡ง๐Ÿ‡ฌ ๐Ÿ‡ญ๐Ÿ‡ท ๐Ÿ‡จ๐Ÿ‡ฟ ๐Ÿ‡ช๐Ÿ‡ช ๐Ÿ‡ฌ๐Ÿ‡ช ๐Ÿ‡ญ๐Ÿ‡บ ๐Ÿ‡ฐ๐Ÿ‡ฟ ๐Ÿ‡ฑ๐Ÿ‡ป ๐Ÿ‡ฑ๐Ÿ‡น ๐Ÿ‡ฒ๐Ÿ‡ฐ ๐Ÿ‡ฒ๐Ÿ‡ฉ ๐Ÿ‡ฒ๐Ÿ‡ช ๐Ÿ‡ต๐Ÿ‡ฑ ๐Ÿ‡ท๐Ÿ‡ด ๐Ÿ‡ท๐Ÿ‡บ ๐Ÿ‡ท๐Ÿ‡ธ ๐Ÿ‡ธ๐Ÿ‡ฐ ๐Ÿ‡ธ๐Ÿ‡ฎ ๐Ÿ‡บ๐Ÿ‡ฆ

Contribution

Feel free to contribute. The details are in our contributing guidelines.

Citation

@misc{tikhonov2021eenlp,
      title={EENLP: Cross-lingual Eastern European NLP Index}, 
      author={Alexey Tikhonov and Alex Malkhasov and Andrey Manoshin and George Dima and Rรฉka Cserhรกti and Md. Sadek Hossain Asif and Matt Sรกrdi},
      year={2021},
      eprint={2108.02605},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Licensing

This index is licensed under Apache-2.0 License. However, please, note that each resource has individual licensing properties.

Development

This is mostly internal documentation for us.

See developing this repository.

About

The broad index of NLP resources for Eastern European languages. The best EEML 2021 project.

License:Apache License 2.0


Languages

Language:Jupyter Notebook 96.5%Language:Python 3.5%