xmig / Python-Language-Detector

Python Language Detector

Home Page:http://www.decultured.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python Language Detector

Python 3.5 version

Trigram-based language detection built in C++ and wrapped as a python module.

Built for speed and reliability as primary concerns, with some small accuracy sacrifices vs. similar algorithms. For the purpose it was built for, the accuracy still remains very high for even short sample texts.

Identification is performed through basic trigram lookups paired with unicode character set recognition. Performance is aided by custom hash tables using Murmur hash for fast lookups.

This is currently used in a large commercial database application in production, and has proven to be stable, reliable, and fast.

MIT Licensed (MIT) - Copyright (c) 2012 Jeffrey Graves

About

Python Language Detector

http://www.decultured.com

License:MIT License


Languages

Language:Python 40.7%Language:C++ 30.2%Language:C 28.4%Language:Objective-C 0.7%