Jaume Zaragoza's repositories
paraphrasing
A repository with different paraphrasing related tools. Sent2vec and paraphrase generation.
terminology
Tools to annotate parallel data with terminology for NMT forced translation
arch-install
Simple bash script to install Arch Linux.
bicleaner
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
Computer-Vision
Computer vision repository
cyrillic-transliteration
Transliterate Cyrillic script to Latin script and vice versa.
datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
diceware-cat
Diccionaris catalans per a generar contrasenyes Diceware
Domain_Adaptation
InDomain detection is a tool designed to extract in-domain data from a large collections of data.
dotfiles
My dotfiles
escape-unk
Escape unknown symbols in SentecePiece vocabularies
fastspell
Targetted language identifier, based on FastText and Hunspell.
gaoya
Locality Sensitive Hashing
Infinity-For-Reddit
A Reddit client for Android
LanguagePack
A language pack project for AnySoftKeyboard
lttoolbox
Finite state compiler, processor and helper tools used by apertium
sacrebleu
Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
serde-fancy-regex
A serde-regex fork to (de)serialize fancy-regex regular expressions
srx
A mostly compliant Rust implementation of the Segmentation Rules eXchange (SRX) 2.0 standard for text segmentation.
students
Efficient teacher-student models and scripts to make them