Dan Bareket's starred repositories
LIME-for-Ranking
A codebase for "Local Model-Agnostic Explanations for Ranking Model Interpretability"
HebSafeHarbor
Hebrew PHI identification and redaction toolkit
NEMO-Corpus
Named Entity (NER) annotations of the Hebrew Treebank (Haaretz newspaper) corpus, including: morpheme and token level NER labels, nested mentions, and more.
awesome-fairness-papers
Papers on fairness in NLP
CrowdLayer
A neural network layer that enables training of deep neural networks directly from crowdsourced labels (e.g. from Amazon Mechanical Turk) or, more generally, labels from multiple annotators with different biases and levels of expertise.
urbanaccess
A tool for GTFS transit and OSM pedestrian network accessibility analysis by UrbanSim
NLI-variation-data
Human annotations for "Inherent Disagreements in Human Textual Inferences" paper
hebrew_tokenizer
A field-tested Hebrew tokenizer for dirty texts (ben-yehuda project, bible, cc100, mc4, opensubs, oscar, twitter) focused on multi-word expression extraction.
wikipedia-to-elastic
Analyze and extract Wikipedia article text and attributes and store them into an ElasticSearch index or to json files (multilingual support)
ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.