ToastyNews's repositories
electra-hongkongese
Pre-trained ELECTRA from Hong Kong data
hong-kong-fastText
fastText vectors created from Hong Kong data.
openrice-senti
Scraped reviews from OpenRice for sentiment analysis. Formatted to use with BERT.
cantonese-nlp-benchmark
Benchmark for Cantonese word segmentation and pos tagging
hongkongese-identifier
Simple statistical detector for Hong Kongese/Standard Chinese/English languages.
lihkg-cat-v2
Scraped forum threads from LIHKG for categorization task. Formatted to use with BERT.
wordshk-sem
Scraped word definition pairs from words.hk for semantic similarity task. Formatted to use with BERT.
ckip-transformers-hk
Hongkongese/Cantonese models compatible with CKIP Transformers
finetune-ckip-transformers
Create training files to fine-tune CKIP Transformers
pytorch-sentiment-analysis
Hong Kongese deep learning data set and notebooks forked from bentrevett/pytorch-sentiment-analysis tutorial
fastText4j
Facebook's FastText for Java
hong-kong-bleu
Data for evaluating translation APIs using Hong Kong text.
lihkg-cat
Scraped forum threads from LIHKG for categorization task. Formatted to use with BERT.