Aleksandr Chuklin's starred repositories
stable-diffusion
A latent text-to-image diffusion model
Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
powerlevel10k
A Zsh theme
aclpubcheck
Tools for checking ACL paper submissions
C4_200M-synthetic-dataset-for-grammatical-error-correction
This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)
e2e-metrics
E2E NLG Challenge Evaluation metrics
yandex-tank
Technical fork. All issues, requests etc. should be done in yandex/yandex-tank
FairRecSys
[Official Codes] Experiments on Generalizability of User-Oriented Fairness in Recommender Systems (SIGIR 2022)
user-satisfaction-simulation
"Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems" in SIGIR'21
clse
The Corpus of Linguistically Significant Entities (CLSE) is a dataset of named entities annotated by linguist experts. It includes 34 languages and covers 74 different semantic types to support various applications from airline ticketing to video games. The aim of the corpus is to facilitate the creation of more linguistically diverse NLG datasets.
telegram-bot-help-ua-ch
Telegrom bot that helps war in Ukraine refugees who seek information about a refuge in Switzerland.