pharos-alexandria / convert-tonos-oxia

Convert Modern Greek “tonos” to Ancient Greek “oxia”

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

convert-tonos-oxia

Convert Modern Greek “tonos” to Ancient Greek “oxia” in UTF-8 encoded plaintext files with Ancient Greek text.

Sometimes in Ancient Greek texts, for example from the TLG or aquired through Tesseract, the letter α, ε, η, ι, ο, υ, and ω with acute accent are encoded according to the Greek and Coptic code chart (i.e. with “tonos”), but should be encoded according to the Greek Extended code chart (i.e. with “Oxia”).

The script therefore replaces the following:

Tonos Oxia
U+03AC (940) ά U+1F71 (8049)
U+03AD (941) έ U+1F73 (8051)
U+03AE (942) ή U+1F75 (8053)
U+03AF (943) ί U+1F77 (8055)
U+03CC (972) ό U+1F79 (8057)
U+03CD (973) ύ U+1F7B (8059)
U+03CE (974) ώ U+1F7D (8061)

It takes two arguments: infile and outfile. Use as follows:

python3 convert-tonos-oxia.py infile.txt outfile.txt

Note!

The Unicode Consortium normalizes to the lower code point letters (i.e. tonos), so it's better not to use this script. Cf. https://wiki.digitalclassicist.org/Greek_Unicode_duplicated_vowels, https://en.wikipedia.org/wiki/Greek_diacritics#Unicode, and https://jktauber.com/articles/python-unicode-ancient-greek/.

About

Convert Modern Greek “tonos” to Ancient Greek “oxia”

License:GNU General Public License v3.0


Languages

Language:Python 100.0%