Shayne Mei's repositories
awesome-cheatsheets
📚 Awesome cheatsheets for popular programming languages, frameworks and development tools. They include everything you should know in one single file.
blogs
Repository for content, code and other related materials for the blogs
CKY-parser
vanilla implementation of Cocke–Kasami-Younger parser
compound-split-de
German compound word morphological splitting. Reproducing results from Koehn and Knight (2003)
convert-cnf-grammar
convert context free grammar in NLTK format to chomsky normal form
Deadline
Join a faction to fight deadlines and the dead!
decision-tree-classifier
build a decision tree classifier with max depth and minimum information gain passed in as early stop criteria
free-spoken-digit-dataset
A free audio dataset of spoken digits. Think MNIST for audio.
hackore
Hack it till you make it. A static site with notes on data science, python, NLTK.
hmm-pos-tagger
Implementing an English POS tagger using HMM and Viterbi
knn-classifier
Implementation of a kNN classifier with two different distance metric options: Euclidean distance and Cosine distance
language-model-en
Build a language model for English from ngrams
lhotse
Tools for handling speech data in machine learning projects.
Mining-the-Social-Web-2nd-Edition
The official online compendium for Mining the Social Web, 2nd Edition (O'Reilly, 2013)
morphological-segmentation
Reproduce morphological segmentation inside-out results from Cotterell et al. (2016), and a frequency based metric using the monolingual German corpus in described in Koehn and Knight (2003).
movie-title-translations
Scrape movie title's different madarin translations in Taiwan, China and Hong Kong
naive-bayes-classifier
Implementation of a multi-variate Bernoulli Naive Bayes model and a multi-nominal Naive Bayes model for classification
ngram-hmm
Build a Hidden Markov Model from different ngrams and interpolation smoothing
open-pixel-art
A collaborative pixel art project to teach people how to contribute to open-source
PCKY-parser
implementation of improved probabilistic Cocke-Kasami-Younger parser and grammar induction script with parent annotation optimisation
scrape-fb-twnews
scrape data from facebook news pages without using graph api
sherpa
Speech-to-text server framework with next-gen Kaldi
software-product-sprint
Google SPS 2020 Portfolio
tokekniser-en
Vanilla English tokeniser using regex and an exception list
web-crawler-tutorial
Python 網頁爬蟲入門實戰