Jirka Balhar's repositories
music-transcription
A small framework for conducting Melody Extraction experiments. We used this framework to improve the state-of-the-art using a custom convolutional architecture.
awesome-align
A neural word aligner based on multilingual BERT
text_segmentation
Unsupervised Segmentation of Text
word-alignment-visualization
Word Alignment Visualization is a Python package for visualizing word alignments between two sentences in a Jupyter notebook. The package provides an interactive widget that displays original and translated sentences with word alignment lines.
deepcompyle
Pretraining transformers to decompile Python bytecodes
fb_messages_search
Facebook Messages Search is a Vue.js application for viewing and searching your Facebook chat threads.
bachelor-thesis
Notes and LaTeX source for my bachelor thesis
better-mff-thesis
A slightly improved variant of the official thesis sample
cs_restaurant_dataset
Czech restaurant information dataset for NLG
czech-webnlg
Czech translation of the WebNLG dataset
czech_nlg
Czech Natural Language Generation from structured data using various pretrained deep learning models
document-translation
Automatic Machine Translation / localization of formatted documents such as XML, DOCX or PDF
framewise_2016
code to reproduce results from a paper about framewise polyphonic piano transcription
lda_topic_modeling
Latent Dirichlet Allocation topic modelling implemented in Python, accelerated through Numba
lindat-translation
Frontend of LINDAT translation service
masters-thesis
Improving Subword Tokenization Methods for Multilingual Models
multilingual-tokenizers
Code repository for my Masters thesis "Improving Subword Tokenization Methods for Multilingual Models"
npfl114
Materials for the Deep Learning -- ÚFAL course NPFL114
pico-glitcher
Voltage glitching exploit tool against the CCxxxx family of chips to bypass readout protection
PicoRX
Build a SDR SW/MW/LW Receiver with a Raspberry Pi Pico
shift_schedule
A tool for scheduling work shifts (for example in a cafe). Programmed using Constraint Programming in SICStus Prolog
Simple-kNN-Gzip
A simplistic linear and multiprocessed approach to sentiment analysis using Gzip Normalized Compression Distances with k nearest neighbors
tokenize-uk
Simple python lib to tokenize texts into sentences and sentences to words. Small, fast and robust. Comes with ukrainian flavour
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
treq-alignment-bookmarklet
Bookmarklet for word alignment visualization in the Treq tool.
WER-in-python
This program calculates the word error rate of hypothesis in ASR and print the aligned result.
words-and-the-company-they-keep
Homework for the class NPFL067: Statistical Methods in Natural Language Processing I