Amanda Bertsch's repositories
unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
long-context-icl
Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"
perspective-shifting
Code for the Findings of EMNLP 2022 paper "He Said, She Said: Style Transfer for Shifting the Perspective of Dialogues"
conlang_generator
Playing around with conlang generation using syntax trees and IPA
minimum-bayes-risk
For the preprint "It's MBR All the Way Down"
topic-modeling
Different dimensionality reduction techniques applied to the 20Newsgroups toy data set for topic modeling
30-seconds-of-python
Short Python code snippets for all your development needs
dialogue-collection
Scraping social media for natural language dialogues
EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
genre-labeled-bookcorpus
This repository contains code to replicate the no-longer publicly available Toronto BookCorpus dataset
lattice-generation
Code for Massive-scale Decoding for Text Generation using Lattices
lm-evaluation-harness
A framework for few-shot evaluation of language models.
lost-in-the-middle
Code and data for "Lost in the Middle: How Language Models Use Long Contexts"
pointer-generator
Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"
tensor2tensor
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
wikipedia-peacock-finder
Deployment for the peacock phrase detection project
wikiwatson
A partial imitation of IBM Watson using information retrieval on Wikipedia articles.