yichen0831 / opencc-python

OpenCC made with Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

開放中文轉換(Pure Python)

Open Chinese convert (OpenCC) in pure Python.

Introduction 簡介

opencc-python 是用純 Python 所寫,使用由 BYVoid(byvoid.kcp@gmail.com) 所開發的 OpenCC 中的字典檔案。 opencc-python 可以支援 Python2.7 及 Python3.x。

opencc-python is made by pure Python with the dictionary files of OpenCC which is developed by BYVoid(byvoid.kcp@gmail.com).

opencc-python can run with Python2.7 and Python3.x.

Installation 安裝

opencc 這個目錄複製到你正在開發的專案中即可,或是執行(需要管理者權限):

python setup.py install

套件也可從 PyPI 安裝,使用指令:

pip install opencc-python-reimplemented

Copy the opencc folder to your project, or run (admin required)

python setup.py install

The package can also be installed from PyPI by issuing:

pip install opencc-python-reimplemented

Usage 使用方式

Code

from opencc import OpenCC
cc = OpenCC('s2t')  # convert from Simplified Chinese to Traditional Chinese
# can also set conversion by calling set_conversion
# cc.set_conversion('s2tw')
to_convert = '开放中文转换'
converted = cc.convert(to_convert)

Command Line

usage: python -m opencc [-h] [-i <file>] [-o <file>] [-c <conversion>]
                        [--in-enc <encoding>] [--out-enc <encoding>]

optional arguments:
  -h, --help            show this help message and exit
  -i <file>, --input <file>
                        Read original text from <file>. (default: None = STDIN)
  -o <file>, --output <file>
                        Write converted text to <file>. (default: None = STDOUT)
  -c <conversion>, --config <conversion>
                        Conversion (default: None)
  --in-enc <encoding>   Encoding for input (default: UTF-8)
  --out-enc <encoding>  Encoding for output (default: UTF-8)

example with UTF-8 encoded file:

  python -m opencc -c s2t -i my_simplified_input_file.txt -o my_traditional_output_file.txt

See https://docs.python.org/3/library/codecs.html#standard-encodings for list of encodings.

Conversions 轉換

  • hk2s: Traditional Chinese (Hong Kong standard) to Simplified Chinese

  • s2hk: Simplified Chinese to Traditional Chinese (Hong Kong standard)

  • s2t: Simplified Chinese to Traditional Chinese

  • s2tw: Simplified Chinese to Traditional Chinese (Taiwan standard)

  • s2twp: Simplified Chinese to Traditional Chinese (Taiwan standard, with phrases)

  • t2hk: Traditional Chinese to Traditional Chinese (Hong Kong standard)

  • t2s: Traditional Chinese to Simplified Chinese

  • t2tw: Traditional Chinese to Traditional Chinese (Taiwan standard)

  • tw2s: Traditional Chinese (Taiwan standard) to Simplified Chinese

  • tw2sp: Traditional Chinese (Taiwan standard) to Simplified Chinese (with phrases)

Issues 問題

當轉換有兩個以上的字詞可能時,程式只會使用第一個。

When there is more than one conversion available, only the first one is taken.

About

OpenCC made with Python

License:Apache License 2.0


Languages

Language:Python 100.0%