pchr8 / eval-UA-tion

Code and sources for the eval-UA-tion benchmark and paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Eval-UA-tion logo

Eval-UA-tion 1.0

Intro

This repository contains the scripts used to generate and evaluate the datasets from the Eval-UA-tion 1.0 benchmark for evaluating LLMs in Ukrainian.

It was initially my Master's Thesis (see /other/MA) and was accepted to UNLP 2024 | The Third Ukrainian Natural Language Processing Workshop (preprint here: Eval-UA-tion 1.0: Benchmark for Evaluating Ukrainian (Large) Language Models - Archive ouverte HAL).

TL; DR

  • Benchmark for evaluating LLMs in the Ukrainian language
  • 3 novel tasks (9 datasets in total).
  • All with human baselines, most contamination-safe (for now...)
  • See presentation (https://serhii.net/F/MA/presentation/#/3/1) for more.
  • TODO UNLP video and presentation will be added here when completed.

This repository

  • /code contains the (messy) code used to generate and evaluate most of the tasks
  • /other/MA contains my Master's Thesis and defense presentation (that have more details than the paper)

License

Thanks and acknowledgements

These awesome people helped proofread the stories, annotate the datasets and establish human baselines (in alphabetical order):

  • Oleksii K.
  • Viacheslav Kravchenko
  • Daria Kravets
  • Anna-Izabella Levbarg
  • Lina Mykhailenko
  • Mariia Tkachenko
  • @arturius453

Anna-Izabella Levbarg wrote the anilev6/HumanResponseBot Telegram bot used for all human baselines.

About

Code and sources for the eval-UA-tion benchmark and paper


Languages

Language:TeX 38.9%Language:HTML 25.9%Language:JavaScript 13.0%Language:Jupyter Notebook 12.3%Language:Python 8.1%Language:CSS 1.4%Language:SCSS 0.4%Language:Lua 0.0%Language:Shell 0.0%