Leon Derczynski's repositories
hatespeechdata
Catalog of abusive language data (PLoS 2020)
emerging_entities_17
Dataset for the Emerging & Novel Entity NER task (WNUT '17)
entity_recognition
framework for doing NER and other types of entity recognition, in Python
lm_risk_cards
Risks and targets for assessing LLMs & LLM vulnerabilities
generalised-brown
C++ implementation of Generalised Brown clustering and python scripts for feature generation (AAAI 2016)
awesome-danish
A curated list of awesome resources for Danish language technology
acl-anthology
Data and software for building the ACL Anthology.
acl-style-files
Official style files for papers submitted to venues of the Association for Computational Linguistics
aclrollingreview
ACL Rolling Review website
CyberAgressionAdo-v1
Dataset of Teen Cyberbullying scenari in French
danlp
DaNLP is a repository for Natural Language Processing resources for the Danish Language.
huggingface_hub
All the open source things related to the Hugging Face Hub.
lm-human-preferences
Code for the paper Fine-Tuning Language Models from Human Preferences
mole-stance
MoLE: Cross-Domain Label-Adaptive Stance Detection
nanoChatGPT
A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trick
Prompt-Engineering-Guide
:octopus: Guide and resources for prompt engineering
RWKV-LM
RWKV v2 is a RNN with transformer-level performance. It can be directly trained like a GPT transformer (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
RWKV-v2-RNN-Pile
RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.