SorkinType / octo-text

Text for testing language support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

octo-text

Notes

Octopus: These are meant for use in Octopus if you have access to that testing software.

I think several kinds of observations or considerations are relevant when using these langauge sample. You might ask:

The files list the letters that occur in the paragraphs at the bottom. The frequency of use determines their order in the list, with the most used letters appearing on the first left. The most used letters will impact the paragraph's color most.

Less frequently occurring letters may still be important of they are rarely used in other languages.

You might find the African language settings less even looking than English Spanish or Italian, or darker overall, or lighter overall. If so, these may be defects in the design of the letters or the spacing or both.

You may also want to ask yourself:

• Does the high frequency of use of a specific letter mean it has to be given additional scrutiny?

• Does a specific letter get used next to another letter that should influence its design or make you design an alt to be used in combination?

• Does seeing a diacritic used with a specific letter make you want to adjust an anchor or the letter itself?

Notes about what I observed about the samples starting with some familiar ones:

Italian - This language is the closest to Latin and is usually not hard to get an even color compared to others. However, the design of z is notably tested in this language. Generally speaking, this is a good language to compare color with.

English - This language has a high use of the letter e and an unusual amount of letter w use.

Turkish - This language has a high proportion of g & k use which can make a paragraph look darker than it should i the g and k are not compensated well enough. Turkish should share an apparent grey or color with Italian.

Dutch - This language is known for the doubling of vowels. This can make your text look too light if the vowels are too bright or loosely spaced.

Ndonga - This language's sentences are unusually long and is like Dutch - lots of doubled letters. It also this language makes unusual use of camel case in words. For example the word "mElelo". It does not seem to be a typo and is instead it seems to be a consistent feature.

Dinka/Nuer - This language gives us a very high amount of use of African letters such as ɛ ɣ and ŋ. It also makes substantial use of use of the 'macron below' diacritic.

Serer - This language lets us test ɓ ƭ ɗ. Overall word length is a bit shorter than in English. This may make a paragraph look a bit light overall, especially if the inside of your i u or o are too bright since these are used a lot and o and i are frequently doubled.

Anii - This language is helpful for testing ɩ ǝ ʊ ɖ ŋ ɔ ɛ w c - a lot. The high frequency of the use of ʊ is especially notable.

Puguli - This language lets us test ɩ ɛ ʋ ɔ ŋ ɓ ɲ ƴ ʊ ɗ. The very high frequency of use of the letters ɩ ɛ ʋ ɔ creates a particularly novel texture. We also see Ƴ Ɩ Ɲ Ɓ Ʋ. Frequently using diacritics and stacked diacritics such as ɛ̃̀ makes it almost Czech-like.

Xhosa - This language has some camel case words like Ndonga, but the main feature is the high frequency of the diagonal letters w y z x k, which may reveal weaknesses in the spacing of these letters or their design.

Fula - This language lets us test our letters: ɓ ɗ ŋ c ƴ ɲ. The ƴ and ɓ in particular is notable. Fula has a lot of doubled a o i j, and e in words. So a bit Dutch-like. But also double ŋ!

Wolof - This language shows us η and ŋ in use together. It is like Turkish in that it has a lot of g and k use and also lots of diagonals like x and y. It is like Fula in that it also exhibits a lot of doubling of letters.

Ewe - This language is among the most useful for testing because Pugli has a lot of African-specific letters in heavy use, such as: ɔ ɖ ŋ ƒ ɣ ʋ ɛ ø Ŋ Ƒ Ɣ Ð Ʋ Ɛ. The strong use of ɔ ɖ ƒ ɣ is notable.

Bambara - This language also makes heavy use of ɔ ɛ. Like Fula, it is especially useful for testing ɲ and ŋ. Also, ε. It has more frequent use of w than English and makes heavy use of m k, and g. So like Turkish, it may expose weaknesses in the spacing and design of these letters.

Hausa - Like many of the languages discussed, this one has a lot of diagonal letter use, such as k y w. It also lets us test the African letters ƙ ɗ ɓ Ƙ. Of these, the ƙ Ƙ is particularly useful and notable.

Project notes:

Disclaimer: I made these texts to highlight patterns and features I observed. They are not perfectly representative of the treated language patterns as a whole. However, if I saw a pattern repeatedly, I went tried to include or feature it. Some of the samples are made from Wikipedia texts. For some of them, I had to make do exclusively with Bible or human rights translations (UNHCR) or a mix of the two.

Sharing: feel free to share the files with colleagues. Feel free to suggest people I should share later updates with.

Thanks: Deep thanks to Denis Jacquerye. He helped me find resources and helped guide the process overall. He is just lovely to work with.

Future: I plan to make paragraphs that support the Google Fonts "Beyond" glyph set, which serves underserved groups in Asia, the Americas, and elsewhere.

I will also add many large and small European languages that are not yet treated, especially if they feature a pattern or diacritic that isn't encompassed already by an existing sample.

Suggestions, Crits, and Requests: Please let me know your thoughts.

About

Text for testing language support


Languages

Language:Python 68.3%Language:Rich Text Format 31.7%