mike-fabian / ibus-table

The tables engines for IBus

Home Page:http://mike-fabian.github.io/ibus-table

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trad/Simp characters marked as Simp only in Unihan_Variants.txt

opened this issue · comments

This is an umbrella issue that I will open (with @mike-fabian's permission, pending) to collect characters that are used in both Trad and Simp Chinese, but are marked as Simp only in Unihan_Variants.txt. Expansion of ibus/ibus#2323.

@mike-fabian I've come across another such character a couple of days ago, but I forgot to open an issue right away, so I forgot what it was. I'm sure there might be more such characters lurking around. So I suggest we keep this issue open so peeps can add such characters when/if they come across them, and you can update Unihan_Variants.txt naturally as you do maintenance on this fork.

I will post a comment in this issue once I find that character again.

https://github.com/mike-fabian/ibus-table/releases/tag/1.15.0 contains

“Update Unihan_Variants.txt to “2021-08-06 Unicode 14.0.0 final” and regenerate engine/chinese_variants.py”

commit 4debaa5
Author: Mike FABIAN mfabian@redhat.com
Date: Tue Oct 5 11:30:31 2021 +0200

Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-08-06 Unicode 14.0.0 final”

- *All* our fixes which are now included upstream.
- Because of the new Unihan_Variants.txt, the following 48 characters
  are added to the VARIANTS_TABLE in chinese_variants.py:
  u'䓖': 1, u'了': 1, u'伙': 1, u'借': 1, u'傢': 2, u'冬': 1, u'千': 1,
  u'卜': 1, u'卷': 1, u'吁': 1, u'合': 1, u'回': 1, u'夥': 2, u'姜': 1,
  u'家': 1, u'峃': 1, u'嶨': 2, u'庼': 1, u'廎': 2, u'懞': 2, u'才': 1,
  u'折': 1, u'捲': 2, u'摺': 2, u'旋': 1, u'朱': 1, u'濛': 2, u'灶': 1,
  u'瞭': 3, u'矇': 2, u'硃': 2, u'秋': 1, u'竈': 2, u'籲': 2, u'纔': 2,
  u'蒙': 1, u'蔑': 1, u'蔔': 2, u'薑': 2, u'藉': 3, u'藭': 2, u'衊': 2,
  u'迴': 2, u'霉': 1, u'鞦': 2, u'黴': 2, u'鼕': 2,

  1 = simplified Chinese
  2 = traditional Chinese
  3 = used both in simplified *and* traditional Chinese

Looking at the above changes coming from the new Unihan_Variants.txt, I have some doubts whether all of them are correct.

For example u'秋': 1, and u'冬': 1,. Aren’t they used in traditional Chinese as well? And u'千': 1, and a few others.

It would be nice if somebody could go the whole list above and check.

Yes, all the characters that you mentioned are used in Trad. I will check this list.

I believe all of the chars from that list are used in Trad except for these two:
u'峃'
u'庼'

Edit:
u'䓖'
this one is also unused in Trad

Okay, I found that character. 栗 li4 is marked as Simp-only, but it's used in Taiwan. There is a city named Miaoli 苗栗 here.

I will post a comment in this issue once I find that character again.

Hi, btw, how can I try the new release on Fedora? I tried doing it the same way I did last time and nothing changed...

Hi, btw, how can I try the new release on Fedora? I tried doing it the same way I did last time and nothing changed...

Which release do you mean?

https://bodhi.fedoraproject.org/updates/FEDORA-2022-8a9b689712
https://github.com/mike-fabian/ibus-table/releases/tag/1.15.0

which contains:

  • Update Unihan_Variants.txt to “2021-08-06 Unicode 14.0.0 final” and regenerate engine/chinese_variants.py
    ?

With 1.15.0, if you use cangjie5 and switch to “traditional Chinese only" mode and type hey you do not get 冬 (because it is now marked as simplified only).

Typing hey with cangjie5 with ibus-table-1.14.1 in traditional Chinese mode gives 冬:

Peek.2022-01-18.07-27.mp4

Typing hey with cangjie5 with ibus-table-1.15.0 in traditional Chinese mode does not give 冬 anymore:

Peek.2022-01-18.07-56.mp4

Sorry, I meant the release with your improvements regarding 顥/顯.
Not the one with the weird new Unihan variants where 冬 isn't considered traditional anymore (by the way, what are we gonna do about that?)

Not the one with the weird new Unihan variants where 冬 isn't considered traditional anymore (by the way, what are we gonna do about that?)

We are going to fix it of course.

Sorry, I meant the release with your improvements regarding 顥/顯.

That was an improvement in the cangjie5 table, so it is in the ibus-table-chinese update not in the ibus-table update.

I.e. it is here:

https://bodhi.fedoraproject.org/updates/FEDORA-2022-76daebdbee
https://github.com/mike-fabian/ibus-table-chinese/releases/tag/1.8.4

Hm, well that's the problem. Running that dnf command doesn't do anything:

$ sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-76daebdbee
[sudo] password for igor:
Fedora 35 - x86_64 - Updates 3.6 kB/s | 5.7 kB 00:01
Fedora 35 - x86_64 - Updates 215 kB/s | 2.0 MB 00:09
Fedora Modular 35 - x86_64 - Updates 8.0 kB/s | 5.9 kB 00:00
Fedora 35 - x86_64 - Test Updates 16 kB/s | 4.2 kB 00:00
Fedora 35 - x86_64 - Test Updates 714 kB/s | 2.5 MB 00:03
Dependencies resolved.
Nothing to do.
Complete!

Do you have ibus-table-chinese-1.8.4 already?

I have the version I installed before through a similar command, the one with the 知/佑 improvements. This command doesn't seem to change anything.

What happens if you try

sudo dnf clean all

before trying

sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-76daebdbee

It remade all the metadata, but then the result is still the same: dependencies resolved, nothing to do.
By the way, how do I check the version of ibus-table-chinese? Is there a terminal command for that?

Oh I just checked the version.
I have ibus-table-chinese-1.8.4.-1.fc35
So I actually have the latest one. Yet it still has the mentioned problem with 顯顥

afmbc brings up 顥 instead of 顯 as the first choice.

OK, I found what happened.

You have to delete

rm  ~/.local/share/ibus-table/tables/cangjie5.cache

while ibus-table is not running, then restart it.

This cache still has the older version of the cangjie5 table.

This cache would have been invalidated automatically if I had increased the serial number of the cangjie5 table, but I forgot to do that when releasing 1.8.4.

Like this:

mike-fabian/ibus-table-chinese@95a4df5

Which will be in 1.8.6

Great, now everything is working as expected. Thanks!

What do you think about the test packages for #85 ?

ibus-table-1.15.1 packages with that new feature for Fedora are here:

https://copr.fedorainfracloud.org/coprs/mfabian/ibus-table/builds/

You can get this update with the commands:

sudo dnf copr enable mfabian/ibus-table 
sudo dnf update ibus-table

I opened a new issue about CJK compatibility ideographs in the cangjie5 table:

mike-fabian/ibus-table-chinese#1

Because an issue about these was reported for the wubi table(s): kaio#76

The user who reported the issue for the wubi tables wanted them removed because “The user can never differentiate them visually.” (That depends on the font, I guess in most cases it is right that they are very hard to distinguish or may even look completely identical).

So I wanted to check whether the cangjie5 table has a similar problem.

Can you please check mike-fabian/ibus-table-chinese#1 ?

Okay, I found that character. 栗 li4 is marked as Simp-only, but it's used in Taiwan. There is a city named Miaoli 苗栗 here.

I will post a comment in this issue once I find that character again.

I opened a new issue for that one:

#95

Better more issues then one issue collecting characters which can never be closed.

37 new characters were added by updating Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”:

c1c39a3

Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”

  • Because of the new Unihan_Variants.txt, the following 37 characters
    are added to the VARIANTS_TABLE in chinese_variants.py:
    u'䓨': 1, u'沄': 1, u'潕': 2, u'澐': 2, u'罃': 2, u'鮗': 2, u'龻': 2,
    u'鿟': 1, u'鿠': 2, u'鿰': 1, u'鿲': 1, u'鿳': 2, u'鿴': 1, u'鿵': 1,
    u'鿶': 1, u'鿷': 1, u'鿸': 1, u'鿹': 1, u'鿺': 1, u'𣲘': 1, u'𤇾': 2,
    u'𤪤': 2, u'𦥯': 2, u'𧰎': 2, u'𩷓': 2, u'𩷕': 2, u'𩹎': 2, u'𪄳': 2,
    u'𪛞': 1, u'𫇦': 1, u'𬉧': 2, u'𬵨': 2, u'𰀡': 1, u'𰀢': 1, u'𰁜': 1,
    u'𰃮': 1, u'𰯲': 2,

    1 = simplified Chinese
    2 = traditional Chinese
    3 = used both in simplified and traditional Chinese

37 new characters were added by updating Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”:

c1c39a3

Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”

* Because of the new Unihan_Variants.txt, the following 37 characters
  are added to the VARIANTS_TABLE in chinese_variants.py:
  u'䓨': 1, u'沄': 1, u'潕': 2, u'澐': 2, u'罃': 2, u'鮗': 2, u'龻': 2,
  u'鿟': 1, u'鿠': 2, u'鿰': 1, u'鿲': 1, u'鿳': 2, u'鿴': 1, u'鿵': 1,
  u'鿶': 1, u'鿷': 1, u'鿸': 1, u'鿹': 1, u'鿺': 1, u'𣲘': 1, u'𤇾': 2,
  u'𤪤': 2, u'𦥯': 2, u'𧰎': 2, u'𩷓': 2, u'𩷕': 2, u'𩹎': 2, u'𪄳': 2,
  u'𪛞': 1, u'𫇦': 1, u'𬉧': 2, u'𬵨': 2, u'𰀡': 1, u'𰀢': 1, u'𰁜': 1,
  u'𰃮': 1, u'𰯲': 2,
  1 = simplified Chinese
  2 = traditional Chinese
  3 = used both in simplified _and_ traditional Chinese

New issue for this part:

#97

https://github.com/mike-fabian/ibus-table/releases/tag/1.15.0 contains

“Update Unihan_Variants.txt to “2021-08-06 Unicode 14.0.0 final” and regenerate engine/chinese_variants.py”

commit 4debaa5 Author: Mike FABIAN mfabian@redhat.com Date: Tue Oct 5 11:30:31 2021 +0200

Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-08-06 Unicode 14.0.0 final”

- *All* our fixes which are now included upstream.
- Because of the new Unihan_Variants.txt, the following 48 characters
  are added to the VARIANTS_TABLE in chinese_variants.py:
  u'䓖': 1, u'了': 1, u'伙': 1, u'借': 1, u'傢': 2, u'冬': 1, u'千': 1,
  u'卜': 1, u'卷': 1, u'吁': 1, u'合': 1, u'回': 1, u'夥': 2, u'姜': 1,
  u'家': 1, u'峃': 1, u'嶨': 2, u'庼': 1, u'廎': 2, u'懞': 2, u'才': 1,
  u'折': 1, u'捲': 2, u'摺': 2, u'旋': 1, u'朱': 1, u'濛': 2, u'灶': 1,
  u'瞭': 3, u'矇': 2, u'硃': 2, u'秋': 1, u'竈': 2, u'籲': 2, u'纔': 2,
  u'蒙': 1, u'蔑': 1, u'蔔': 2, u'薑': 2, u'藉': 3, u'藭': 2, u'衊': 2,
  u'迴': 2, u'霉': 1, u'鞦': 2, u'黴': 2, u'鼕': 2,

  1 = simplified Chinese
  2 = traditional Chinese
  3 = used both in simplified *and* traditional Chinese

New issue for this part:

#96

We have 3 new issues now for the remaining stuff here and can close this one.

Better more issues then one issue collecting characters which can never be closed.

Gotcha!