Alex Jones's repositories
alt-bitexts
A set of notebooks showcasing methods for mining bitexts from parallel or comparable corpora
XLAnalysis5K
Code and data for EMNLP 2021 paper "A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space."
KALComp
A comparable corpus of Kalaallisut and Danish web-crawled sentences, along with some noisy aligned texts and code for MT finetuning experiments between Kalaallisut and English. Currently looking to improve the quality of pseudoparallel data. Final project for LING28/Computational Linguistics, Dartmouth College, Winter 2022.
acl-anthology
Data and software for building the ACL Anthology.
DS-ML-Python-cheat-sheets
Interactive notebooks that walk through the basics of the most widely used Python data science and ML libraries.
examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
homebrew-cask
🍻 A CLI workflow for the administration of macOS applications distributed as binaries
JoSH
[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
langchain
⚡ Building applications with LLMs through composability ⚡
NL-Augmenter
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
poincare-embeddings
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"
SentimentMT
Repo associated with "Sentiment-based Candidate Selection for NMT." || Decoder-side sentiment-based translation selection.
sk-dist
Distributed scikit-learn meta-estimators in PySpark
text-autoaugment
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.