mike-fabian / ibus-table

The tables engines for IBus

Home Page:http://mike-fabian.github.io/ibus-table

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can you add the function of digital keypad input coding? For the Chinese character stroke input method I wrote.

YQ-YSY opened this issue · comments

@mike-fabian Thanks your works in ibus-table. Can you add the function of digital keypad input coding? For the Chinese character stroke input method I wrote.
https://github.com/YQ-YSY/stroke-seq_MB

Chinese stroke input method is very simple. For example, input 1234 can get Chinese text 'wood'. So, if ibus can use digital keypad input coding 0-9, the speed of input Chinese will very very fast, and, only use one hand. Unfortunately, all engine, inclue ibus-table can not use digital keypad input coding 0~9, they only use digital keypad select one of Chinese text in output list.

Interesting. Can you explain this in a bit more detail?
I assume you want an option not to use the number keys 1-9 to select candidates
but use numbers for input instead.
ibus-typing-booster has such an option, if that option is enabled in ibus-typing-booster,
one can still select candidates using F1-F9, with the arrow keys, and with the mouse. But one
can use 1-9 for regular input then.

I guess you want a similar option for ibus-table, right?

I'm glad to receive your reply. I spent a year and a half designing Code table for Chinese stroke sequence (one hand) input method. But I'm not a programmer. I can't write software.

Here is an example picture I imagined:
https://github.com/YQ-YSY/stroke-seq_MB/blob/master/icon/Sample.jpg
The example in picture is Chinese text 'beg', stroke code is 124134.

I hope: All operations can be done with one hand on digital keypad.
After input code, press + button, the text list become blue, and digital keypad switch to select mode,
user can select the text in list, still use one hand on the same digital keypad, presss 0-9 to select.
After select the text, digital keypad auto switch to input mode, the text list become black again.

This is the most basic and important function.
Other extended functions of digital keypad in the picture, I'll explain to you more detail if you need.
For example:
digital keypad / button, used for input the Chinese comma.
digital keypad . button, used for input the Chinese full stop.
digital keypad - button, used for Backpace, easy for users to modify his input.
digital keypad * button, used for fuzzy query, if users forget the code.

Most people are using Chinese pinyin input method, because everyone learned pinyin in school when he was a kid.
The same, everyone learned how to write Chinese text in school when he was a kid. So, everyone can use this Chinese stroke sequence (one hand) input method, don't have to learn, don't have to memory code.

In the National standard for the Stroke of Chinese characters, every Chinese text is made up of five stroke:
1 ------ horizontal, from left to right
2 ------ vertical, from up to down
3 ------ from up down to the left
4 ------ from up down to the right, include point.
5 ------ any kind of turning, for example: from horizontal turn to vertical.

As you know, the code of Chinese text 'one' is 1, 'two' is 11, 'three' is 111, 'four' is 25351.
But many Chinese characters have too many strokes, the most strokes in one text have 48.
So, in order to improve the input speed, in the code table I designed,
I use 0-9 double-digit to represented as a part of Chinese text.

For example:
1234 is Chinese text ‘wood’, and many Chinese text have this ‘wood’ in it.
So, in the code table I designed, 12 is 'wood' text, is a part of Chinese text ‘wood’ in it, too.
Therefore:
the Chinese text ‘grove’ is 1212, yes, two wood is grove (side by side).
the Chinese text ‘forest’ is 121212, yes, three wood is forest (triangulate).
Chinese is very intuitive and interesting, isn't it?

In the code table I designed, all Chinese text can be inputed no more than 6 codes.

I checked again and ibus-table does already support using select keys different from 1-9.

See for example:

https://github.com/mike-fabian/ibus-table-others/blob/master/tables/latex.txt#L89 👍

SELECT_KEYS = F1,F2,F3,F4,F5,F6,F7,F8,F9

This latex.txt table also uses number keys like 1-9 as input, therefore it is necessary to choose
different select keys.

You write: In the code table I designed, all Chinese text can be inputed no more than 6 codes.

I am still confused which file in your git repository is supposed to be the table with the input codes.
I assume the files in the "text" subdirectory are supposed to be the tables with the input codes.

Looking there for "wood", "grove", "forest", I find:

mfabian@taka:/local/mfabian/src/stroke-seq_MB (master)
$ egrep -r '[[:space:]]木|林|森[[:space:]]' text/
text/单字_四合一_29685个.txt:174        木      4       null    null    12      201     1234    300    mù       mu4
text/单字_四合一_29685个.txt:2480       林      8       null    null    1212    200     12341234       300      lín     lin2
text/单字_四合一_29685个.txt:9034       森      12      null    null    121212  200     123412341234   300      sēn     sen1
text/单字_笔顺码_29685个.txt:174        木      4       1234    300
text/单字_笔顺码_29685个.txt:2480       林      8       12341234        300
text/单字_笔顺码_29685个.txt:9034       森      12      123412341234    300
text/单字_六全码_29685个.txt:174        木      4       12      201
text/单字_六全码_29685个.txt:2480       林      8       1212    200
text/单字_六全码_29685个.txt:9034       森      12      121212  200
text/混排_三合一_64653个.txt:174        木      4       12      201
text/混排_三合一_64653个.txt:2480       林      8       1212    200
text/混排_三合一_64653个.txt:9034       森      12      121212  200
text/混排_三合一_64653个.txt:174        木      4       1234    300
text/混排_三合一_64653个.txt:2480       林      8       12341234        300
text/混排_三合一_64653个.txt:9034       森      12      123412341234    300
mfabian@taka:/local/mfabian/src/stroke-seq_MB (master)
$ 

So now I am wondering what the different columns mean.

In the first 3 files above, the input codes 12, 1212, 121212, seem to be in the 6th column.
In the other files, the input codes seem to be in the 4th column.
The second column always seems to contain the resulting Chinese character.
I have no idea what the other columns contain.

Your input method seems very similar to https://en.wikipedia.org/wiki/Stroke_count_method
This input method is actually already supported by ibus-table, it is in the stroke5.txt table.

Here:

https://github.com/definite/ibus-table-chinese/blob/master/tables/stroke5/stroke5.txt

You use the numbers 1, 2, 3, 4, 5 for the different types of strokes,
stroke5.txt uses nm,./ instead. Otherwise it seems very similar, maybe event the same.

https://github.com/definite/ibus-table-chinese/blob/master/tables/stroke5/stroke5.txt
contains:

### Define the prompts of each valid input char.
BEGIN_CHAR_PROMPTS_DEFINITION
n ㄣ
m -
, ’
. 、
/ |
END_CHAR_PROMPTS_DEFINITION
END_DEFINITION

I.e. / is for a vertical stroke, m for a horizontal stroke, , for falling to the left, . for falling to the right, and n for a turning stroke.

$ egrep -r '木|林|森' stroke5.txt 
### 王林根 Mike Wong <mike_lg_wong@yahoo.com.hk>
m/,.m	林	490
m/,.	木	500
/m,.m	森	500

The 3rd line in a table used by ibus-table is a priority, so for the characters which have the same
input code, the 3rd table determines in which order the characters should be shown
in the candidate list. For example the input m/,.m matches quite a lot characters, the list starts with:

m/,.m 本 500
m/,.m 極 495
m/,.m 林 490
m/,.m 標 485
...

So when typing m/,.m then 本 will be on top of the list of candidates initially.
I write initially, because stroke5.txt contains:

DYNAMIC_ADJUST = TRUE

that means it learns from user input.
So if the user types m/,.m and then selects the 3rd candidate 林, the priority of that character will be increased. And when the user types m/,.m again, 林 will be on top of the list.

Thanks your reply.
using F1-F9 select is not high efficiency for input Chinese.
I still hope digital keypad can switch the key function, between input mode and select mode.

About the different file mean:
text/单字_笔顺码_29685个.txt ----- This file is the code of National standard for the Stroke of Chinese characters, every text only use 1-5 standard stroke code. The total code length is equal to the number of strokes.
text/单字_六全码_29685个.txt ----- This file is the code table I designed, every text use 0-9 code. The total code length is no more than 6 codes.
text/单字_精简码_5177个.txt ----- This file is streamline code of the most commonly used 6 code Chinese text. The total code length is only 1-5 codes.
text/混排_三合一_64653个.txt ----- This file is 3 in 1, above three mergers, not removed duplicate.
text/单字_四合一_29685个.txt: ----- This file is 4 in 1, above three mergers, add pinyin, already removed duplicate.

If you want to test the code table, suggest using text/单字_四合一_29685个.txt: 4 in 1.

About the different columns mean:
text/单字_四合一_29685个.txt:174 木 4 null null 12 201 1234 300 mù mu4
text/单字_四合一_29685个.txt:2480 林 8 null null 1212 200 12341234 300 lín lin2
text/单字_四合一_29685个.txt:9034 森 12 null null 121212 200 123412341234 300 sēn sen1

1 column (174--2480--9034) ----- This is the text ID in the National standard for the Stroke of Chinese characters. use for some the programmer who not install Chinese font, he can not see any character, so he can use this ID to matching the right Chinese text.
2 column (木--林--森) ----- This is the Chinese text.
3 column (4--8--12) ----- This is the number of Chinese text strokes.
4 column (null) ----- This is the streamline code of the Chinese text. only 5177 texts have streamline code. other is null.
5 column (null) ----- This is the text/word frequency of streamline code of the Chinese text, for the same code text. only 5177 texts have streamline code. other is null.
6 column (12--1212--121212) ----- This is the 6 code table I designed, every text use 0-9 code. The total code length is no more than 6 codes.
7 column (201--200-200) ----- This is the text/word frequency of the 6 code. for the same code text.
8 column (1234--12341234--123412341234) ----- This is the the code of National standard for the Stroke of Chinese characters, every text only use 1-5 standard stroke code. The total code length is equal to the number of strokes.
9 column (300--300--300) ----- This is the text/word frequency of the code of National standard for the Stroke of Chinese characters, for the same code text.
10 column (mù--lín--sēn) ----- This is the pinyin of this Chinese text, use for show to user. (some text maybe have multiple pinyin, here only one)
11 column (mu4--lin2--sen1) ----- This is the pinyin of this Chinese text, use for let user input pinyin code and phonetic symbol number. So, when user forget how to write a Chinese text, but he remember how to speak this Chinese text, he can direct use letter keyboard input pinyin, no need switch to pinyin input method.

How about this:

SELECT_KEYS = KP_End,KP_Down,KP_Next,KP_Left,KP_Begin,KP_Right,KP_Home,KP_Up,KP_Page_Up

These are the key symbols my digital keypad produces when Numlock is off. When Numlock is in,
it produces KP_1,KP_2,KP_3,KP_4,KP_5,KP_5,KP_6,KP_7,KP_8,KP_9 instead.

So you could switch between the mode where the keypad keys select the candidates 1-9 or
produce the input number 1-9 with the Numlock key instead of the + key.

Using Numlock for that purpose is just as good as +, isn’t it? Both + and Numlock are right next to
the digital keypad on my keyboard.

yeah, it seens the way can to solve the problem of select.
I want learn how to use ibus-table to test my code, I will try to find some Chinese tutorial...

And, I will show you, how to input Chinese word.
For example:
The real Chinese word ‘forest’ is 森林, haha, 5 woods.

In the word code table I designed,:
6 code ---- is 121212.1212 Use a dot to separate each text.
the code of National standard for the Stroke of Chinese characters ---- is 123412341234.12341234 Use a dot to separate each text.

and I hope, user can use * botton to fuzzy query or streamline the code he input.
It like: 12*.12* So, the input speed is greatly improved.

stroke-seq_MB/dict --- This is a word collection, divide into multiple files by different types,
For example: citys, names, poem...

you can use stroke-seq_MB/dict/core_dict/核心_规范词汇_77826个.txt to test code,
because it includ the most commonly used words. includ the word ‘forest’ -- 森林.

About the different columns mean:
'wei'sheng'bu 卫生部 521.371.618052 521.31121.4143125152
1 column ('wei'sheng'bu) ----- This is pinyin, user can input it get the word, too. Use a ' to separate each text.
2 column (卫生部) ----- This is the Chinese text of tihs word. Here the word means Ministry of Health.
3 column (521.371.618052) ----- This is 6 code I designed. Use a dot to separate each text.
4 column (521.31121.4143125152) ----- This is the code of National standard for the Stroke of Chinese characters. Use a dot to separate each text.

There is no text/word frequency in word coed table,
because there is almost no duplication of Stroke code.
But in pinyin input method, there are a lot of duplication of code.
Therefore, Chinese Stroke input method will more faster than pinyin input method.

Another important question is:
How to input number when user using Chinese stroke sequence input method on digital keypad.

in the Sample.jpg show, I hope:
when user input * symbol button at first, he can input number, not stroke code.
For example:
input *1234 an press enter, get the number 1234
input 1234 an press enter, get the Chinese text 'wood'.
input 1*4, show the fuzzy query results of code, list all text with code is first 1, end 4.

Wildcards like * are supported by ibus-table. To input numbers one can switch into direct input mode.

From the point of view of improve efficiency, 'ctrl + space' key switch into direct input mode to input numbers, too troublesome. especially for some article with a large amount of data. Can I use + button input numbers? For example: input +3.14159265 and press enter, get the number 3.14159265 (π, Pi).

Is it possible to mapping digital keypad button to letter keypad button?
I hope,
----No matter input mode and select mode:
when 1st input - button, it mapping to Backspace, user can modify his input.
----input mode:
when 1st input / button, it mapping to comma.
when 1st input . button, it mapping to full stop.
when 1st input numbers, and . button follow numbers, it separate each text in word.
when 1st input + button, follow numbers, and . button follow numbers, it‘s decimal point of number.
----select mode:
/ button, it mapping to Pageup candidates list.
* button, it mapping to Pagedown candidates list.

Coupled with use letter keypad input pinyin, press shift switch into English/Chinese pinyin,
If all of that happens, My code can input four code in one 101 keypad: stroke, pinyin, English, number.
NO NEED press 'ctrl + space' key switch into direct input mode. It's very efficient.

Last question:
can ibus-table show two lines prompt information?
one for candidates list, another for stroke code tips. As my Sample.jpg shows.

the stroke code tips is here: stroke-seq_MB/icon/Code_Table.svg
For example:
when user input 1, stroke code tips will show 10-19 code that represented as a part of Chinese text.
So, user no need to learn all code table, just follow the stroke, continue to input, get the text he want.

ibus-table shows hints how to continue the input. Here is an example when using the wubi-jidian86.txt
table:

wubi

The typed letter is "a". 工 is an exact match. To get 㐂 the complete code is "aaab".
For 其 the complete code is "adwu".

I think you should just try ibus-table and whether it works for your purpose.

Could you create a table with 3 columns seperated by tabs where the 1st column contains the code,
the 2nd column the Chinese character produced by that code and the 3rd column some
priority code which gives the order of priority for characters with the same code.

If you do that, I can add an appropriate header to that table and make a tesst build for Fedora
for ibus-table and ibus-table-chinese including your table, then you could test it out.

You write:

----No matter input mode and select mode:
when 1st input - button, it mapping to Backspace, user can modify his input.
----input mode:
when 1st input / button, it mapping to comma.
when 1st input . button, it mapping to full stop.
when 1st input numbers, and . button follow numbers, it separate each text in word.
when 1st input + button, follow numbers, and . button follow numbers, it‘s decimal point of number.
----select mode:
/ button, it mapping to Pageup candidates list.
* button, it mapping to Pagedown candidates list.

If you need very specific stuff which is different from every other input method supported
by ibus-table, then maybe you need to write your very own input method and not use something
generic like ibus-table. ibus-table cannot make lots and lots of special exceptions for every table.

I do think that ibus-table is quite good for your purpose and I suggest to write a table in
a format usable by ibus-table and try it out.

Thank you for your help.
I will create a table with 3 columns to you, but it need several days...
And I wonder, how many lines this table can contain?

Because, in the stroke code table I designed,
there are 29,685 Chinese text -- they have 64,653 stroke code, if add pinyin, text table need 100,000 lines.
if all 70,377 Chinese text add to table, text table need 310,000 lines.
And, core dict have 180,659 words, extend dict have 2,000,000 words,
Every word have 2 stroke code and 1 pinyin, so, all of word table need 6,500,000 lines.

I don't know, ibus-table need all core in one file? or split into several files?

You don’t need pinyin because ibus-table already has a pinyin mode. It is enough
to add a line like this:

PINYIN_MODE = TRUE

to the header of your table to enable pinyin mode. See for example:

https://github.com/definite/ibus-table-chinese/blob/master/tables/stroke5/stroke5.txt#L73

You need to have the table in one file, don’t split it into several files.

Thank you for your help.
bus_table_seq_3494962_text_word.txt.zip was send to your Gmail. (43.23M).

Can you ( or someone of your friend) make a tesst build for Ubuntu?
Because most users are using Ubuntu & Fedora Debian in China.

I have done ibus-table stroke-seq.db and stroke-seq.txt files, already send to your gmail. It works, but it has some problem, I don't know how to solve it.