silnrsi / teckit

A Text Encoding Conversion toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TECkit - A Text Encoding Conversion Toolkit

Overview

TECkit is a low-level toolkit intended to be used by applications for conversions between text encodings. For example, it can be used when importing legacy text into a Unicode-based application.

The primary component of TECkit is a library: the TECkit engine. The engine relies on mapping tables in a specific, documented binary format. The TECkit compiler creates these tables from plain-text, human-readable descriptions.

Project status

TECKit TECKit non-x86

The TECkit libraries supports the Unicode 15.0.0 character repertoire.

TECkit also includes some command line applications:

  • teckit_compile: TECkit compiler
  • txtconv: a simple text encoding conversion tool
  • sfconv: convert legacy encoded SFM files to Unicode

Note:

More powerful tools to handle text encodings using the TECkit library are

References:

Getting TECkit

Our binary releases can be obtained from https://software.sil.org/teckit/#downloads.

TECkit is also included in TeX Live and packaged by many Linux distributions.

The source code repository and source releases are available at https://github.com/silnrsi/teckit.

Building TECkit

The simplest way to build the source from the repository is as follows:

$ ./autogen.sh             # Generate the configure script
$ mkdir build              # Create a new, empty directory
$ cd build                 # Change into the new directory
$ ../configure             # Run the generated configure script
$ make                     # Build
$ make check               # Run the tests (optional)
$ cat test/dotests.pl.log  # View the test output (optional)
$ make install             # Install (optional)

Notes:

  • The default installation directory prefix is /usr/local. You most likely do not want this. The directory can be changed with ../configure --prefix=<install-dir>.
  • Other configuration options can be found with ../configure --help.
  • This uses an out-of-tree build in the build directory. This is recommended to avoid polluting the source directories with build products. The name of the directory doesn't matter (as long as it is empty), but build is ignored by git (see the .gitignore), which makes it convenient when looking at git status for example.
  • For testing on Windows, set the BINDIR variable "set BINDIR="C:\Users\<username>\Downloads\teckit\TECkit-<version>\64-bit"

Dependencies

Configuration and building requires autoconf, automake, and libtool.

The TECkit engine and compiler require zlib. The build can be configured to use the system zlib with --with-system-zlib or to use the bundled zlib with --without-system-zlib.

The sfconv binary requires libexpat. The configure script checks for a system libexpat. If it is found, it is used. If it is not found, the bundled libexpat is used.

Platforms

TECkit can be built on Windows, Linux, and macOS (formerly Mac OS X).

The build-*.sh show how releases are made for the different platforms.

Microsoft Windows

We do not typically test building on Windows itself. Instead, we build releases on Ubuntu Bionic by cross-compiling for MinGW with the following packages:

  • gcc-mingw-w64-i686
  • g++-mingw-w64-i686
  • gcc-mingw-w64-x86-64
  • g++-mingw-w64-x86-64

Starting with TECkit version 2.5.7 there are several changes with the Windows builds. First, a 64-bit build has been added to the already existing 32-bit build. Second, the runtime files have changed. The file libwinpthread-1.dll is no longer needed. The file libstdc++-6.dll needs to match the build (32- or 64-bit) of the rest of the code. The 32-bit build needs libgcc_s_sjlj-1.dll, while the 64-bit build needs libgcc_s_seh-1.dll.

About

A Text Encoding Conversion toolkit

License:Other


Languages

Language:C 50.9%Language:C++ 36.3%Language:Shell 4.5%Language:Ada 1.9%Language:Pascal 1.6%Language:C# 1.2%Language:HTML 0.7%Language:Perl 0.6%Language:Makefile 0.6%Language:DIGITAL Command Language 0.6%Language:Assembly 0.4%Language:Roff 0.2%Language:CMake 0.2%Language:XS 0.1%Language:M4 0.1%Language:Java 0.1%Language:Rich Text Format 0.1%Language:SAS 0.0%Language:Module Management System 0.0%Language:Batchfile 0.0%Language:Raku 0.0%