yongtso's repositories
C0A2DD042
A parallel corpus to train machine translation models
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
WeiboSpider_SentimentAnalysis
借助Python抓取微博数据,并对抓取的数据进行情绪分析
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
vietnamese_spelling_error_correction
Detect misspell words with LSTM and replace it with XLM-R masked language model
SentiWordNet
The SentiWordNet sentiment lexicon
bert
TensorFlow code and pre-trained models for BERT
nlp-beginner
NLP上手教程
SentiLARE
Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)
covost
CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed)
bertviz
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Bo-Eng-Machine-Transation
Tibetan to English Machine Translation
PhoBERT
PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
bonlp-dataset
Tibetan NLP training dataset for various NLP task
LOTClass
[EMNLP 2020] Text Classification Using Label Names Only: A Language Model Self-Training Approach
bonltk
BoNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Bokey, Tibetan language.
CharBERT
CharBERT: Character-aware Pre-trained Language Model (COLING2020)
fastText
Library for fast text representation and classification.
albert_zh
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
botok
🏷 བོད་རྟོགས། [pʰøtɔk̚] Tibetan word tokenizer in Python
namsel
An OCR application focused on machine-print Tibetan text
lstm_next_sequence_prediction
implement recurrent neural network and long short-term memory network from scratch without frameworks
MUSE
A library for Multilingual Unsupervised or Supervised word Embeddings
BabelNet-Sememe-Prediction
Code and data of the AAAI-20 paper "Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets"