OOPS-ORG-PHP / mod_chardet

Determine the charset of the input data with Mozilla Universal Charset Detection PHP extension

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mod_chardet php extension

License: MPL 1.1 GitHub closed issues GitHub closed pull requests

License

© 2022. JoungKyun.Kim All rights reserved.

This program is under MPL 1.1 or GPL v2

Abstract

Determine the charset of the input data with Mozilla Universal Charset Detection C/C++ library

This is php extension that is libchardet PHP frontend.

libchardet is based on Mozilla Universal Charset Detection C/C++ library and, detects the character set used to encode data.

This module is a c-binding, is much faster than the other chardet packages taht is made by PHP code.

mod_chardet extension supports three method for detecting charset. Supporting method and required library is as follow:

  • libchardet - Mozilla Universal Charset Detect C/C++ library
  • ICU - IBM International Components for Unicode
  • python-chardet - Mozilla Universal Charset Detect with pure python

For CJKV(Chinese, Japanese, Korean, Vitenams) languages, recommended to use MUCD(Mozilla Universal Charset Detect). This method is best. And, about single byte languages, MUCD and ICU all best.

In the case of python-chardet mode, even use the MUCD. However, the call performance is very not good. The mode is support for test, so when if you don't give configure options, this mode does not work basically.

For more informations, see also Reference document.

Downloads

Installation

1. Requires

  • mod_chardet versions
    • PHP 7 and after : mod_chardet >= 1.0.0
    • PHP 5 ans before : mod_chardet < 1.0.0
  • PHP >= 4.1
  • libchardet >= 1.0.5
  • libicu (optional)
  • python-chardet (optional)

2. Build

First, check libraries about libchardet, libicu and python-chardet.

You must install one of libchardet or libicu.

The function of python-chardet is for checking result with python-chardet. The performance of this feature is not very good and we don't recommand to use this feature.

[root@host mod_chardet]$ phpize
[root@host mod_chardet]$ ./configure --help
  ...
  --enable-moz-chardet    Support Mozilla chardet [default=yes]
  --enable-icu-chardet    Support ICU chardet [default=yes]
  --enable-py-chardet     Support python chardet [default=no]
  ...
[root@host mod_chardet]$ ./configure
[root@host mod_chardet]$ make && make install

3. Configurations

add DSO extension config to your php.ini

extension = chardet.so

Usages

See also sample script of repository.

About

Determine the charset of the input data with Mozilla Universal Charset Detection PHP extension

License:Other


Languages

Language:C 59.0%Language:PHP 17.2%Language:M4 14.7%Language:Shell 8.3%Language:Vim Script 0.8%