Vladimir Gurevich's repositories
jupyter-notebook-viewer
chrome extension for viewing Jupyter Notebooks in the browser without Jupyter Server
yandex-practicum
tasks and projects from the data science course by Yandex.Practicum
AnkiTools4j
anki decks creation in Java
deep_learning_school
tasks and projects from the deep learning school by MIPT
hebrew_summarizer
finetuning experiments on summarization tasks for Hebrew
wav2vec2-hebrew
Speech Recognition for Hebrew (using wav2vec2 models)
news_scrapers
This repository contains scripts for scraping news from different sources
abydos
Abydos NLP/IR library for Python [imvladikon] made some changes
annotations_deduplications
scripts to deduplicate annotations and to refine NER spans or to analyze the differences
bm25_vectorizer
sklearn compatible bm25 vectorizers
cdatasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble [imvladikon] added cython implementations
character-bert-pretraining
Code for pre-training CharacterBERT models (as well as BERT models).
deduplicator
Simple entity deduplication package
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
evaluate
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
indonesian_nlp_experiments
some experiments in Indonesian NLP (information extraction from the courts reports)
pysubs3
A Python library for editing subtitle files (fork of pysubs2 with changes)
spacy-trankit
💥 Trankit models directly in spaCy💥
string-embed
😆 string embed for fast edit distance computation, codes for [Convolutional Embedding for Edit Distance (SIGIR 20)].
telegram-bot-hebrew
telegram (spring boot, java) with some language services for hebrew (translation, inflection)
wikitalk_parser
Fetching and parsing Wikipedia Talks