About Cihui

Cihui input method

The Cihui input method is a phrase input method for use with Macintosh 6.0.x Traditional Chinese system. There is also a simplified character version that would work with Macintosh 6.0.5 Simplified Chinese system. You may use and distribute it freely but it may not be resold. All rights are reserved.

In this documentation we shall assume that you are familiar with the Macintosh Chinese system and the convention used in the other input methods, and we shall not repeat the basic features common to other input methods. Complete novice may want to read the appendix first, which has a summary of the basic features.

The Cihui input method is a pinyin based input method that allows you to use phrase rather than character as the input unit to minimize the problem of homonyms. Tone is optional. If you use them, then it is more likely that you can eliminate the homonyms, but it would still work without any tones, you just run the chances of having more homonyms.

Cihui demo 1

Cihui demo 2

Cihui demo 3

Even if you don't normally know the tones, you may still be able to take advantage of tones to make more specific choices. This is true because of the standard input method feature (it is standard in all pre-6.0.x simplified version Chinese input method, unfortunately it is often not implemented in the later versions) that if you select a character and hit the '?' key, you get the whole input string, so if you select ma and hit '?',

Cihui demo 4

you get the tones and hence eliminate ma's of other tones:

Cihui demo 5

When tones are used, there is no question about when one character ends and the other character begins, however if they are not used it may result in ambiguities. An example would be xian, which can be a single character or a two-character phrase. If you just type xian then both cases are considered possible candidates:

Cihui demo 6

However, the phrase you want may be buried behind the character list:

Cihui demo 7

In order to clearly specify xian as two characters instead of one, you may use / to separate the two characters. (With pinyin, the normal notation is xi'an rather than a slash, however, we avoid using ' because the input method may also be used with Wade-Giles romanization which needs to use ').

Cihui demo 8

Tones can be used to make the choice more specific. We can also go in the other direction, i.e. make thing less specific by using abbreviations. The period is the abbreviation symbol just like it is in English, so the sh. means any pronunciation that starts with sh, like shi, shen etc.

Cihui demo 9

The abbreviation can be used in any character:

Cihui demo 10

If you abbreviate to the shorter s. then it is even less specific and you will have even more phrases:

Cihui demo 11

Actually the shortest abbreviation is not 's.' but '.', so the following input asks for phrase where the first character is sh-something, but you don't care what is in the 2nd character:

Cihui demo 12

However it should be pointed out that to use the '.' abbreviation in the beginning of a phrase is not implemented in version 1.8. The following are four phrases that start with letter z, there are no restrictions about second and third character, but the fourth character must start with y.

Cihui demo 13

In general when you use abbreviations, you would be increasing the size of the candidate list, hence they are used if you are not sure about the pronunciation and don't mind looking through a long list of candidates. However if your phrase is long enough, it is very likely that the phrase is unique even though you abbreviate every character:

Cihui demo 14

That is certainly easier to type than:

Cihui demo 15

So while abbreviation can be used for a more fuzzy search, it can also be used as an abbreviation to achieve very fast typing. In fact such an abbreviation (where every character is abbreviated to a single letter) is so common that we have an abbreviation for that abbreviation, if you start out with `, then every character that follows will be regarded as a single letter abbreviation for the character:

Cihui demo 16

Of course, . can still be used to denote a don't care character:

Cihui demo 17

Now we know what a phrase is in the Cihui input method. If you want to enter one phrase at a time, you can set the preference to do so, then the space bar is the conversion key. However you may want to type in a whole sentence before you choose the characters. Cihui 1.8 does not make any more intelligent choice even when the whole sentence is present. Nonetheless some user may find it more convenient to enter a sentence before picking the phrases. If the option is sentence mode, then space is just a separator between phrases, and the conversion key will be the tab key (or space key hit twice).

As an example, you may type "pingguo diannao shishi" (In the following examples, for reasons of brevity, tones or abbreviations are not used, but you could of course use them if you wanted to):

Cihui demo 18

Let us try to understand why we have this particular candidates list. The first candidate is the system's guess of all the phrases. If the guess is correct, all you need is to hit return and then you are done. The second candidate is there because both pingguo and diannao are unique (no homonyms) so there is a good chance that this is correct and the one that is wrong is the third phrase, so you are given the choice to select the correct portion first. The third candidate assumes at least the first phrase is correct. The fourth candidate assumes that the problem is in the first phrase. Since the only phrases available are wrong, you have to fall back to entering one character at a time, that is why all the options for ping are displayed.

Now suppose indeed the mistake is in the third phrase, and you select candidate 2 (by using the mouse or by cursor key followed by the return key), you would be left with the following screen where you got to choose the correct third phrase:

Cihui demo 19

After selecting the third phrase you are done.

But suppose in the beginning the problem is in the second phrase, then you pick the third candidate, and you will be left with:

Cihui demo 20

Since the second phrase is wrong, and that is the only phrase available, so this must be a new phrase and you have to pick the characters for this new phrase. All the characters for dian has been displayed, so you just pick the right one:

Cihui demo 21

After picking dian, all the characters for nao are shown and you can pick the correct character. Now you have completed the first two phrases and you are left with the choices in the third phrase. Again you may pick one of the existing phrases or go through the characters if it is a new phrase:

Cihui demo 22

It should be pointed out that when the phrase you want does not exist and you have to pick one character at a time, even though it is a pain, you may not have to repeat it again because if the learning option is on, this phrase is entered into the dictionary automatically and will be available in future. So if you type diannao again, you can find the new phrase:

Cihui demo 23

We can see the general philosophy of the process. The system offers all sort of candidates which are the longest run of possibly correct phrases starting from the beginning. If some of them are wrong, you pick out the correct portion starting from the front and get them out of the way, and you are left with the remainder, which has some wrong choices. You do this recursively until the whole sentence is completely resolved. In some sense this is counter-intuitive, since it is natural to try to access the part that is wrong and change it, but here you will be trying to get the part that is correct and move it out of the way to float the mistake to the front. But once you understand the idea it can also become quite natural. One drawback with such an interface is that if you enter a new five-character phrase and the first four is already correct, you still have to pick the first four one by one before you get to the fifth one. I shall attempt to address this issue in version 1.9.

One of the biggest problems with phrase input methods is that it is often difficult to break a sentence into phrases, in fact most of the weaker phrase input methods out there fails if you break the sentence up the wrong way. Cihui 1.8 would still try to do what it can. Here is a phrase that is not in the dictionary but Cihui 1.8 still decodes it:

Cihui demo 24

But often it would not do such a good job, Cihui prefers you to separate the phrases by spaces so that it can do a better job. If you insist on entering text like above, Cihui may not do the job as well, furthermore if learning is on, it would blow up the size of you dictionary, so separate your phrases.

The question remains, how should I separate the phrases? The answer is separate them in whatever way you prefer to do but do it consistently. Since Cihui is learning the phrases from you, sooner or later it would adopt to the way you use it.

A word of caution about the learning, consider the following example, and notice that same character appears twice since the two are of different tones:

Cihui demo 25

Cihui demo 26

When you select the characters, you are also selecting the tone, so if you pick the wrong one (there is no direct information about the tone except by looking at the surrounding characters), your dictionary will learn a phrase with the wrong tone.

Every time the system boots up, a certain amount of memory (amount is adjustable by change a resource) is reserved for expansion of the dictionary. Once the memory is used up, no more learning is possible and you have to wait for the next re-boot before more learning is possible.

The Cihui CDEV

Now let us look at the options in the Cihui CDEV:

Cihui CDEV

The first three options are the same as in other input methods.

Rearrange order means that if there are many homonyms and you pick one, next time that character is going to appear in the front of the homonym list.

Save on shutdown means that when the system is shut down, all the new entries that the dictionary has learned and any rearrangement resulting from using the 'rearrange order'-option will be saved permanently, otherwise every time you re-boot, you start over with the same dictionary.

Do sentence means that you want to do more than one phrase at a time, so tab or space bar twice would be the converter key. If option is off, then space bar is the converter key.

Break into chars means that if a phrase if not found, then it would attempt to break into sub-components. For example if zhongwen is not found, then the system would try zhong and then wen. However by changing the data it is possible to use the same input method for English-to-Chinese input. In such a case, if carpet is not in the dictionary but car and pet are, it still does not make sense to break it up, so you want to turn this option off. It should also be pointed out that in Cihui 1.8 (but not in a future Cihui 2.0) a mixture of pinyin and other arbitrary strings is possible, so you can have "FRANCE" instead on "FA3GUO2" in your dictionary.

Learn phrase means that if a phrase is not in the dictionary and it is broken into sub-components and entered as individual characters, the dictionary would try to add the phrase to the dictionary. Note that even if it is on, there is a phrase length limit (controlled by a value in a resource) beyond which the phrase will not be entered into the dictionary.

The save button means save the content of the dictionary to the file immediately.

The revert button means read the content of the dictionary file into memory, and hence undoing all the changes made after boot up. This may also be used to update the dictionary in memory after the dictionary has been edited by a utility. In this case the operation may not be possible because then the file dictionary may be too big to be loaded into memory.

The find operation is used to invert a phrase from Chinese character into the pinyin. You enter the phrase (or more like you paste it from the clipboard because if you can enter it you then you already know the pinyin and has no need for the operation), then hit the return key or the find button. The pinyin will be displayed in the scrolling list below. An even more useful feature of "find" is that the usual wildcard ? and * character can be used. ? stand for a single Chinese character and * stands for a string of Chinese character. So if using the above example, all the phrases (up to 100), that has two characters where the second is ren, will be shown in the list. If you want all the phrases with character X somewhere, you could have used *X*:

Cihui demo 27

If you want to list all phrase with 7 characters, you could have used ???????:

Cihui demo 28

You can select one or more phrase (shift-click etc) in the phrase list from the find operation, and then hit button to remove them from the dictionary.

Using Cihui with other type of data

So far we have been talking about Cihui as a pinyin input method. Actually by changing the data (with the help of the Cihui dictionary utility), it is possible to change the input method. For more detail, look in the document about Cihui dictionary utility. First of all, tones are interpreted as any digit from 0-9. It is possible to assign a different meaning to the tone digit. For example, if you don't care about the tones because you can never tell what the value is, you may change the dictionary so that the tone digit is used to represent the first digit of the four-corner code. Now it becomes a pinyin + four corner input method.

Pinyin is not the only possible romanization used in the input. The dictionary utility support conversion of the dictionary to use it with Zhuyin 2nd form, Wade-Giles, Yale and ShuangPin (where only two letters are used for each sound), you can even invent your own romanization scheme and use it here. BoPoMoFo ZhuYin is not supported in the sense that the input window will not show the ZhuYin symbols*, also unless you use tones, there will be a lot of ambiguities in the character boundary and Cihui would not do a very good job.

Of course, the input need not be in Putonghua at all. It has been used successfully with Cantonese data for a Cantonese input method.

We can even go further, the input need not be sound related. The only requirement is that if the input string for character X is ABC1, and for character Y is DEF2, then the input string for phrase XY should be ABC1DEF2. Also, digits should not be used in the input string unless it is the last one. Even if digits are not used in the input method, an arbitrary digit can always be put at the end to mark the end of a character. So it is possible to use Cihui as a Tsangchi phrase input method (even though there may not be a lot to be gain by that since Tsangchi does not need phrase input to eliminate homonyms, on the other hand the abbreviation in Cihui can be useful in such a scheme as a shortcut).

Even the "ABC1DEF2 input for phrase XY" can be relaxed. If you don't care about learning and the breakup of a undefined phrase into sub-components, then the rule need not be obeyed. If you turns off these options, then Cihui may be used as an English to Chinese input method. Or you can keep it as a pinyin method, but allow English word mix in part of the input, e.g. FRANCE instead of FA3GUO2. However there is no way to enter FRANCE into the dictionary from the input method.

If you want to modify Cihui along these lines, please refer to the document on Cihui Dictionary Utility on how it can be done.

Miscellaneous Features

The code for Chinese punctuation and its keyboard map has been removed so that the input method code can stay within the 32K limit. Other than that Cihui 1.8 tries to leave the standard interface module common to all input methods untouched. Any attempt to change the interface (such as fixing the fact that a single click instead of double click is used on the candidate window and makes autoscrolling impossible) would be left to Cihui 1.9.

If you need to type Chinese punctuation marks, you can type comma and period directly. For others you can invent an input string for it, right now "V" will display all the symbols, and "BOPOMOFO" will display all the ZhuYin symbols, you can improve on this according to your specific needs.

When the candidate window is visible, the return key is used to select a single character. Option return can be used to select all the candidates in the option list. Very few will find this useful, but it was useful for me when I was working on making a HyperCard Chinese dictionary.

For Hackers

It is possible to further customize Cihui if you are willing to use ResEdit to edit some resources (hopefully not on your only copy).

With 'iopt' ID 17217 you can change the following values:

  • Offset/current value/meaning
  • $01 $20 separator between phrase
  • $03 $09 convertor key for sentence
  • $05 $60 ` - the start of abbreviation sequence
  • $07 $2E . - the abbreviation symbol
  • $08 $1000 room for expansion for dictionary
  • $0B $0C maximum size of phrase to be learned in bytes

With 'iopt' ID 17216 you can change the following values:

  • Offset/current value/meaning
  • $18 $0009 id of input method - if you have two copies of Cihui running at the same time, you need to assign a different ID (preferably a large random number to minimizechance of conflict with other input methods).
  • $47 $3F ? - the key to ask for the input sequence
  • $49 $3F ? - the key to ask for the input sequence
  • $4E $4200 input window font ID
  • $50 $4200 candidate window font ID

Each input method is identified by a SICN (usually it is a Chinese character since both a SICN and a 12pt Chinese character are both 16x16 bitmap, so it is convenient), if you want to change it (for example if you have two copies of Cihui), then change the SICN resource id 17216.

Acknowledgments

Most of the data in the dictionary are entered by Lars Fredriksson and Zhang Lin Fredriksson of the Far Eastern Library in Stockholm. They have been heavily involved with the design and testing of the Cihui input method and the correction of mistakes in this document and the utility document. Also Peter Bryder of Lund, Fung Fung Lee and Tin-Fook Ngai of Stanford, Kang-i Sun of Yale, Charles Tang of Dartmouth, Pai Chou of Berkeley all have been using it and offering much valuable advice.

Appendix: General information for input methods

The following information should apply to almost all input methods because they share the same user interface module, therefore it should apply to Cihui as well.

To install an input method, just drag the file and put it into the system folder. You have to reboot before you can use it. To uninstall it, just delete the file or move it away from the system folder, reboot if you want to free up the space it takes up in RAM. To check if it is installed, check the control panel, you should find your input method there.

Furthermore, if you select the cdev for the input method, you would find the options dialog, the one for Cihui was shown earlier in this document. If you select the Chinese system cdev, you should see the following and your input on the list. If you want to use it, you may select it and hit the "current state" button. Hit the start up preference button if you want this input method to be the default input method on startup.

Cihui demo 29

To do Chinese input, first look at the right hand side of the menu bar.

Cihui demo 30

If it is the diamond, then input would be in English, otherwise it is for Chinese (or Japanese, or Korean etc where each language would have a different symbol). You can toggle the mode by clicking on it (or you can use command-space to toggle). If the input mode is Chinese, then your keystroke would be captured and display in an input window at the bottom instead of your document. A typical one is shown below:

Cihui demo 31

The input string that you type is in the input window. If dynamic option is on, there is also a candidate window showing possible candidate phrases/characters. If you see the character you want in the holding area, you can click on it to get the phrase/character or you can also hit return to get the selected phrase/character. You can change the selection in the candidate by using the cursor keys. Usually when you are done with inputting you hit a conversion key, then from that point on you are involved with selecting the right phrase rather than typing the input string.

Here is a summary of the keystroke use when you are still in the input mode:

  • legal input string keystroke: used for editing of the input string.
  • cursor key: affects the candidate window and not the input window.
  • return: enter the current selected phrase/character in the candidate window into your document.
  • enter: enter the current input string into your document.
  • delete: use for editing unless the input string is empty in which case cancel Chinese input.
  • clear: if input string is not empty then empty it, otherwise cancel Chinese input.
  • conversion key: go to candidate selection mode.
  • ?: make the input string to be the same as the one for the selected character/phrase in the candidate and then go to candidate selection mode.

When you are in the candidate selection mode, the keystrokes have the following meaning:

  • return,enter: enter the current selected phrase/character in the candidate window into your document.
  • delete,clear: cancel candidate selection mode.
  • cursor key: affects the selection in the candidate window, if you do not have cursor key, you may use I,J,K,L key instead.
  • ?: make the input string to be the same as the one for the selected character/phrase in the candidate and then go to candidate selection mode.
  • 1,2,3,4,5,6,7,8,9,0(10): enter the Nth character/phrase from the current line into the document.