mhshih's repositories
ArticutAPI_Taigi
Taigi CWS/POS/NER natural language processing tool with Articut as kernel.
nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Susing-Piauki
輸入全漢kah全羅,對齊後,ta̍k-ê詞標詞性
AI_Tutorial
Rocling2019 AI Tutorial file
Alpaca-CoT
We extend CoT data to Alpaca to boost its reasoning ability. We are constantly expanding our collection of instruction-tuning data, and integrating more LLMs together for easy use. (我们将CoT数据扩展到Alpaca以提高其推理能力,同时我们将不断收集更多的instruction-tuning数据集,并在我们框架下集成进更多的LLM。)
ArticutAPI
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 91% 以上,Recall 96% 以上的成績。
bert
TensorFlow code and pre-trained models for BERT
Chinese-Vicuna
Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
fChartExamples2
fChart 6.0以上版本的分類範例
hue7jip8
台語、族語、客語的語料清單、彙整
interactive-tutorials
Interactive Tutorials
ladsbook
Linguistic Analysis and Data Science
MALINDO_Morph
Kamus morfologi untuk bahasa Melayu/Indonesia
moedict-data-twblg
臺灣閩南語常用詞辭典 資料檔
overleaf
A web-based collaborative LaTeX editor
readr-data
We will open the data for the news
Susing-Kuhuat-Piautiau
台語詞性句法變調