Mingfai Ma's starred repositories

traffic-shm

traffic-shm (Anna) is a Java based lock free IPC library.

Language:JavaLicense:Apache-2.0Stargazers:89Issues:0Issues:0

slice

Java library for efficiently working with heap and off-heap memory

Language:JavaLicense:Apache-2.0Stargazers:505Issues:0Issues:0

arangodb

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

Language:C++License:NOASSERTIONStargazers:13565Issues:0Issues:0

tablesaw

Java dataframe and visualization library

Language:JavaLicense:Apache-2.0Stargazers:3550Issues:0Issues:0

fast-bert

Super easy library for BERT based NLP models

Language:PythonLicense:Apache-2.0Stargazers:1864Issues:0Issues:0

JFastText

Java interface for fastText

Language:JavaLicense:NOASSERTIONStargazers:2Issues:0Issues:0

fastText_java

Java port of c++ version of facebook fasttext

Language:JavaLicense:NOASSERTIONStargazers:122Issues:0Issues:0

fastText4j

Facebook's FastText for Java

Language:JavaLicense:BSD-3-ClauseStargazers:78Issues:0Issues:0

mynlp

一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)

Language:JavaLicense:Apache-2.0Stargazers:676Issues:0Issues:0

Chinese-StopWords

中文常用的停用词(包含百度、哈工大、四川大学等词表)

Language:PythonStargazers:25Issues:0Issues:0

ChineseStopWords

常用的中文停用词表

Language:PythonStargazers:72Issues:0Issues:0

Chinese-Text-Classification-Based-on-Naive-Bayes

The development of computer and communications technology has resulted in huge amount of data. The automatic text classification technique has become very significant. Naive Bayes algorithm is based on probabilistic model. It is an effective way to deal with automatic text classification. The main task of this paper is to discuss the theoretical basis of Naive Bayes text classifier and describe the process of using Java language to accomplish the classifier. We can divide the classifier into two parts: the feature extraction and the calculation according to the feature. In the feature extraction part, I use the Chinese word segmentation method and the stop words filtering. In the classification part, I calculate the prior probability, the likelihood function value and the maximum a posterior estimation. During the simple test, the author uses the Sogou laboratory’s text classification corpus as the training set and the test set. During the test, the accuracy is between 39% to 56 %. The results show that there is still room for improvement. The paper also includes the discussion of its improvement methods and wider application.

Language:JavaLicense:MITStargazers:6Issues:0Issues:0

CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.

Language:JavaLicense:GPL-3.0Stargazers:9695Issues:0Issues:0

better-jieba

更好的jieba java版

Language:JavaLicense:Apache-2.0Stargazers:19Issues:0Issues:0

jieba-analysis

结巴分词(java版)

Language:JavaLicense:Apache-2.0Stargazers:2585Issues:0Issues:0

opencc4j

🇨🇳Open Chinese Convert is an opensource project for conversion between Traditional Chinese and Simplified Chinese.(java 中文繁简体转换)

Language:JavaLicense:Apache-2.0Stargazers:480Issues:0Issues:0

tokenize_chinese_nlp

This is a project to testing whether the package jieba is a good package to tokenize Chinese phrases.

Language:PythonStargazers:8Issues:0Issues:0

tntsearch

A fully featured full text search engine written in PHP

Language:PHPLicense:MITStargazers:3086Issues:0Issues:0

JIRLbot

Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and Dmitri Loguinov in their paper "IRLbot: Scaling to 6 Billion Pages and Beyond"

Language:JavaStargazers:17Issues:0Issues:0

cupq

a CUDA implementation of a priority queue

Language:C++License:Apache-2.0Stargazers:81Issues:0Issues:0

psd_sdk

A C++ library that directly reads Photoshop PSD files.

Language:C++License:BSD-2-ClauseStargazers:618Issues:0Issues:0

filament

Filament is a real-time physically based rendering engine for Android, iOS, Windows, Linux, macOS, and WebGL2

Language:C++License:Apache-2.0Stargazers:17799Issues:0Issues:0

nitrite-java

NoSQL embedded document store for Java

Language:JavaLicense:Apache-2.0Stargazers:841Issues:0Issues:0

triemap

Java port of a concurrent trie hash map implementation from the Scala collections library

Language:JavaLicense:Apache-2.0Stargazers:27Issues:0Issues:0

java-concurrent-hash-trie-map

Java port of a concurrent trie hash map implementation from the Scala collections library

Language:JavaStargazers:151Issues:0Issues:0

capsule

The Capsule Hash Trie Collections Library

Language:JavaLicense:BSD-2-ClauseStargazers:404Issues:0Issues:0

zipkin

Zipkin is a distributed tracing system

Language:JavaLicense:Apache-2.0Stargazers:17007Issues:0Issues:0

swarm

A Java implementation of SWIM

Language:JavaLicense:MITStargazers:1Issues:0Issues:0

swim-java

SWIM Protocol in Java

Language:JavaStargazers:7Issues:0Issues:0

scalecube-cluster

ScaleCube Cluster is a lightweight Java VM implementation of SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol. features cluster membership, failure detection, and gossip protocol library.

Language:JavaLicense:Apache-2.0Stargazers:263Issues:0Issues:0