xtao / PyCharlockHolmes

Character encoding detecting library for Python using ICU and libmagic.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Charlock Holmes

Build Status

Character encoding detecting library for Python using ICU and libmagic. Inspired by Charlock Holmes

Dependency

  1. icu
  2. file(libmagic)

Gentoo

emerge -av dev-libs/icu
emerge -av sys-apps/file

Ubuntu

apt-get install libicu-dev
apt-get install libmagic-dev

Brew

brew install icu4c
brew install libmagic
export ICUI18N="/usr/local/Cellar/icu4c/xx" # Replace "xx" as the version of your icu
export MAGIC="/usr/local/Cellar/libmagic/xx" # Replace "xx" as the version of your libmagic

Install

python setup build
python setup install

Usage

from charlockholmes import detect
file = open('test.txt')
content = file.read()
print detect(content)

License

Modified BSD License

About

Character encoding detecting library for Python using ICU and libmagic.

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Common Lisp 44.6%Language:Python 35.2%Language:C 18.9%Language:C++ 1.1%Language:Shell 0.2%