Mingfai Ma's starred repositories
traffic-shm
traffic-shm (Anna) is a Java based lock free IPC library.
fastText_java
Java port of c++ version of facebook fasttext
fastText4j
Facebook's FastText for Java
Chinese-StopWords
中文常用的停用词(包含百度、哈工大、四川大学等词表)
ChineseStopWords
常用的中文停用词表
Chinese-Text-Classification-Based-on-Naive-Bayes
The development of computer and communications technology has resulted in huge amount of data. The automatic text classification technique has become very significant. Naive Bayes algorithm is based on probabilistic model. It is an effective way to deal with automatic text classification. The main task of this paper is to discuss the theoretical basis of Naive Bayes text classifier and describe the process of using Java language to accomplish the classifier. We can divide the classifier into two parts: the feature extraction and the calculation according to the feature. In the feature extraction part, I use the Chinese word segmentation method and the stop words filtering. In the classification part, I calculate the prior probability, the likelihood function value and the maximum a posterior estimation. During the simple test, the author uses the Sogou laboratory’s text classification corpus as the training set and the test set. During the test, the accuracy is between 39% to 56 %. The results show that there is still room for improvement. The paper also includes the discussion of its improvement methods and wider application.
better-jieba
更好的jieba java版
jieba-analysis
结巴分词(java版)
tokenize_chinese_nlp
This is a project to testing whether the package jieba is a good package to tokenize Chinese phrases.
nitrite-java
NoSQL embedded document store for Java
java-concurrent-hash-trie-map
Java port of a concurrent trie hash map implementation from the Scala collections library
scalecube-cluster
ScaleCube Cluster is a lightweight Java VM implementation of SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol. features cluster membership, failure detection, and gossip protocol library.