Trad/Simp characters marked as Simp only in Unihan_Variants.txt
opened this issue · comments
This is an umbrella issue that I will open (with @mike-fabian's permission, pending) to collect characters that are used in both Trad and Simp Chinese, but are marked as Simp only in Unihan_Variants.txt. Expansion of ibus/ibus#2323.
@mike-fabian I've come across another such character a couple of days ago, but I forgot to open an issue right away, so I forgot what it was. I'm sure there might be more such characters lurking around. So I suggest we keep this issue open so peeps can add such characters when/if they come across them, and you can update Unihan_Variants.txt naturally as you do maintenance on this fork.
I will post a comment in this issue once I find that character again.
https://github.com/mike-fabian/ibus-table/releases/tag/1.15.0 contains
“Update Unihan_Variants.txt to “2021-08-06 Unicode 14.0.0 final” and regenerate engine/chinese_variants.py”
commit 4debaa5
Author: Mike FABIAN mfabian@redhat.com
Date: Tue Oct 5 11:30:31 2021 +0200
Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-08-06 Unicode 14.0.0 final”
- *All* our fixes which are now included upstream.
- Because of the new Unihan_Variants.txt, the following 48 characters
are added to the VARIANTS_TABLE in chinese_variants.py:
u'䓖': 1, u'了': 1, u'伙': 1, u'借': 1, u'傢': 2, u'冬': 1, u'千': 1,
u'卜': 1, u'卷': 1, u'吁': 1, u'合': 1, u'回': 1, u'夥': 2, u'姜': 1,
u'家': 1, u'峃': 1, u'嶨': 2, u'庼': 1, u'廎': 2, u'懞': 2, u'才': 1,
u'折': 1, u'捲': 2, u'摺': 2, u'旋': 1, u'朱': 1, u'濛': 2, u'灶': 1,
u'瞭': 3, u'矇': 2, u'硃': 2, u'秋': 1, u'竈': 2, u'籲': 2, u'纔': 2,
u'蒙': 1, u'蔑': 1, u'蔔': 2, u'薑': 2, u'藉': 3, u'藭': 2, u'衊': 2,
u'迴': 2, u'霉': 1, u'鞦': 2, u'黴': 2, u'鼕': 2,
1 = simplified Chinese
2 = traditional Chinese
3 = used both in simplified *and* traditional Chinese
Looking at the above changes coming from the new Unihan_Variants.txt, I have some doubts whether all of them are correct.
For example u'秋': 1,
and u'冬': 1,
. Aren’t they used in traditional Chinese as well? And u'千': 1,
and a few others.
It would be nice if somebody could go the whole list above and check.
Yes, all the characters that you mentioned are used in Trad. I will check this list.
I believe all of the chars from that list are used in Trad except for these two:
u'峃'
u'庼'
Edit:
u'䓖'
this one is also unused in Trad
Okay, I found that character. 栗 li4 is marked as Simp-only, but it's used in Taiwan. There is a city named Miaoli 苗栗 here.
I will post a comment in this issue once I find that character again.
Hi, btw, how can I try the new release on Fedora? I tried doing it the same way I did last time and nothing changed...
Hi, btw, how can I try the new release on Fedora? I tried doing it the same way I did last time and nothing changed...
Which release do you mean?
https://bodhi.fedoraproject.org/updates/FEDORA-2022-8a9b689712
https://github.com/mike-fabian/ibus-table/releases/tag/1.15.0
which contains:
- Update Unihan_Variants.txt to “2021-08-06 Unicode 14.0.0 final” and regenerate engine/chinese_variants.py
?
With 1.15.0, if you use cangjie5 and switch to “traditional Chinese only" mode and type hey
you do not get 冬 (because it is now marked as simplified only).
Typing hey
with cangjie5 with ibus-table-1.14.1 in traditional Chinese mode gives 冬:
Peek.2022-01-18.07-27.mp4
Typing hey with cangjie5 with ibus-table-1.15.0 in traditional Chinese mode does not give 冬 anymore:
Peek.2022-01-18.07-56.mp4
Sorry, I meant the release with your improvements regarding 顥/顯.
Not the one with the weird new Unihan variants where 冬 isn't considered traditional anymore (by the way, what are we gonna do about that?)
Not the one with the weird new Unihan variants where 冬 isn't considered traditional anymore (by the way, what are we gonna do about that?)
We are going to fix it of course.
Sorry, I meant the release with your improvements regarding 顥/顯.
That was an improvement in the cangjie5 table, so it is in the ibus-table-chinese
update not in the ibus-table
update.
I.e. it is here:
https://bodhi.fedoraproject.org/updates/FEDORA-2022-76daebdbee
https://github.com/mike-fabian/ibus-table-chinese/releases/tag/1.8.4
Hm, well that's the problem. Running that dnf command doesn't do anything:
$ sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-76daebdbee
[sudo] password for igor:
Fedora 35 - x86_64 - Updates 3.6 kB/s | 5.7 kB 00:01
Fedora 35 - x86_64 - Updates 215 kB/s | 2.0 MB 00:09
Fedora Modular 35 - x86_64 - Updates 8.0 kB/s | 5.9 kB 00:00
Fedora 35 - x86_64 - Test Updates 16 kB/s | 4.2 kB 00:00
Fedora 35 - x86_64 - Test Updates 714 kB/s | 2.5 MB 00:03
Dependencies resolved.
Nothing to do.
Complete!
Do you have ibus-table-chinese-1.8.4 already?
I have the version I installed before through a similar command, the one with the 知/佑 improvements. This command doesn't seem to change anything.
https://bodhi.fedoraproject.org/updates/FEDORA-2021-8808d4f69f
I updated it through this command.
What happens if you try
sudo dnf clean all
before trying
sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-76daebdbee
It remade all the metadata, but then the result is still the same: dependencies resolved, nothing to do.
By the way, how do I check the version of ibus-table-chinese? Is there a terminal command for that?
Oh I just checked the version.
I have ibus-table-chinese-1.8.4.-1.fc35
So I actually have the latest one. Yet it still has the mentioned problem with 顯顥
afmbc
brings up 顥 instead of 顯 as the first choice.
OK, I found what happened.
You have to delete
rm ~/.local/share/ibus-table/tables/cangjie5.cache
while ibus-table is not running, then restart it.
This cache still has the older version of the cangjie5 table.
This cache would have been invalidated automatically if I had increased the serial number of the cangjie5 table, but I forgot to do that when releasing 1.8.4.
Like this:
mike-fabian/ibus-table-chinese@95a4df5
Which will be in 1.8.6
Great, now everything is working as expected. Thanks!
What do you think about the test packages for #85 ?
ibus-table-1.15.1 packages with that new feature for Fedora are here:
https://copr.fedorainfracloud.org/coprs/mfabian/ibus-table/builds/
You can get this update with the commands:
sudo dnf copr enable mfabian/ibus-table
sudo dnf update ibus-table
I opened a new issue about CJK compatibility ideographs in the cangjie5 table:
mike-fabian/ibus-table-chinese#1
Because an issue about these was reported for the wubi table(s): kaio#76
The user who reported the issue for the wubi tables wanted them removed because “The user can never differentiate them visually.” (That depends on the font, I guess in most cases it is right that they are very hard to distinguish or may even look completely identical).
So I wanted to check whether the cangjie5 table has a similar problem.
Can you please check mike-fabian/ibus-table-chinese#1 ?
Okay
Okay, I found that character. 栗 li4 is marked as Simp-only, but it's used in Taiwan. There is a city named Miaoli 苗栗 here.
I will post a comment in this issue once I find that character again.
I opened a new issue for that one:
Better more issues then one issue collecting characters which can never be closed.
37 new characters were added by updating Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”:
Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”
-
Because of the new Unihan_Variants.txt, the following 37 characters
are added to the VARIANTS_TABLE in chinese_variants.py:
u'䓨': 1, u'沄': 1, u'潕': 2, u'澐': 2, u'罃': 2, u'鮗': 2, u'龻': 2,
u'鿟': 1, u'鿠': 2, u'鿰': 1, u'鿲': 1, u'鿳': 2, u'鿴': 1, u'鿵': 1,
u'鿶': 1, u'鿷': 1, u'鿸': 1, u'鿹': 1, u'鿺': 1, u'𣲘': 1, u'𤇾': 2,
u'𤪤': 2, u'𦥯': 2, u'𧰎': 2, u'𩷓': 2, u'𩷕': 2, u'𩹎': 2, u'𪄳': 2,
u'𪛞': 1, u'𫇦': 1, u'𬉧': 2, u'𬵨': 2, u'𰀡': 1, u'𰀢': 1, u'𰁜': 1,
u'𰃮': 1, u'𰯲': 2,1 = simplified Chinese
2 = traditional Chinese
3 = used both in simplified and traditional Chinese
37 new characters were added by updating Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”:
Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-12-01 Unicode 15.0.0 draft”
* Because of the new Unihan_Variants.txt, the following 37 characters are added to the VARIANTS_TABLE in chinese_variants.py: u'䓨': 1, u'沄': 1, u'潕': 2, u'澐': 2, u'罃': 2, u'鮗': 2, u'龻': 2, u'鿟': 1, u'鿠': 2, u'鿰': 1, u'鿲': 1, u'鿳': 2, u'鿴': 1, u'鿵': 1, u'鿶': 1, u'鿷': 1, u'鿸': 1, u'鿹': 1, u'鿺': 1, u'𣲘': 1, u'𤇾': 2, u'𤪤': 2, u'𦥯': 2, u'𧰎': 2, u'𩷓': 2, u'𩷕': 2, u'𩹎': 2, u'𪄳': 2, u'𪛞': 1, u'𫇦': 1, u'𬉧': 2, u'𬵨': 2, u'𰀡': 1, u'𰀢': 1, u'𰁜': 1, u'𰃮': 1, u'𰯲': 2, 1 = simplified Chinese 2 = traditional Chinese 3 = used both in simplified _and_ traditional Chinese
New issue for this part:
https://github.com/mike-fabian/ibus-table/releases/tag/1.15.0 contains
“Update Unihan_Variants.txt to “2021-08-06 Unicode 14.0.0 final” and regenerate engine/chinese_variants.py”
commit 4debaa5 Author: Mike FABIAN mfabian@redhat.com Date: Tue Oct 5 11:30:31 2021 +0200
Regenerate engine/chinese_variants.py for Unihan_Variants.txt from “2021-08-06 Unicode 14.0.0 final” - *All* our fixes which are now included upstream. - Because of the new Unihan_Variants.txt, the following 48 characters are added to the VARIANTS_TABLE in chinese_variants.py: u'䓖': 1, u'了': 1, u'伙': 1, u'借': 1, u'傢': 2, u'冬': 1, u'千': 1, u'卜': 1, u'卷': 1, u'吁': 1, u'合': 1, u'回': 1, u'夥': 2, u'姜': 1, u'家': 1, u'峃': 1, u'嶨': 2, u'庼': 1, u'廎': 2, u'懞': 2, u'才': 1, u'折': 1, u'捲': 2, u'摺': 2, u'旋': 1, u'朱': 1, u'濛': 2, u'灶': 1, u'瞭': 3, u'矇': 2, u'硃': 2, u'秋': 1, u'竈': 2, u'籲': 2, u'纔': 2, u'蒙': 1, u'蔑': 1, u'蔔': 2, u'薑': 2, u'藉': 3, u'藭': 2, u'衊': 2, u'迴': 2, u'霉': 1, u'鞦': 2, u'黴': 2, u'鼕': 2, 1 = simplified Chinese 2 = traditional Chinese 3 = used both in simplified *and* traditional Chinese
New issue for this part:
We have 3 new issues now for the remaining stuff here and can close this one.
Better more issues then one issue collecting characters which can never be closed.
Gotcha!