Orion Weller's repositories
RedditHumorDetection
Code and datasets for the paper "Humor Detection: A Transformer Gets the Last Laugh"
rJokesData
A large scale Humor Dataset, containing more than 550k rated English jokes (LREC'20)
Multilingual-Federated-Learning
Code for the paper "Pretrained Models for Multilingual Federated Learning" at NAACL 2022
humorTranslate
Using Machine Translation to "translate" non-humor into humor. Code for the paper "Humorous Headline Generation via Style Transfer" at FigLang 2020
according-to
Getting language models to quote from their pre-training data (EACL'24)
configtune
An easy way to tune machine learning hyperparameters (especially for those that use a config file)
DocumentReadingTime
Code and data from the ACL paper "You Don’t Have Time to Read This: an Exploration of Document-LevelReading Time Prediction"
GANExperiments
My experiments with GANs
LLaMA-Factory
Unify Efficient Fine-Tuning of 100+ LLMs
LM-expansions
When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets
contextual-repr-analysis
A toolkit for evaluating the linguistic knowledge and transferability of contextual representations. Code for "Linguistic Knowledge and Transferability of Contextual Representations", to appear at NAACL 2019.
disinformation-defense
Defending Against Misinformation Attacks in Open-Domain Question Answering
fisher-callhome-corpus
The Fisher and CALLHOME Spanish–English Speech Translation Corpus
InstructIR
IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our focuses on user-aligned instructions tailored to each query instance.
mteb
MTEB: Massive Text Embedding Benchmark
ocaml-bert
Transformer-based models for Natural Language Processing in OCaml
pyresparser
A simple resume parser used for extracting information from resumes
rebiber
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
strategyqa
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
streaming
A Data Streaming Library for Efficient Neural Network Training
tevatron
Tevatron - A flexible toolkit for neural retrieval research and development.