Simplified Chinese Input Method (OS 10.4)

Input Menu

The Simplified Chinese input method (SCIM) is Unicode-based and supports the GB 18030 character set and its GB 2312 subset. When you select an input method or plug-in in the Input menu, a uniform set of options appears in three sections:

Input menu

The first section includes a list of double-byte punctuation characters and a new feature that converts Simplified Chinese text to Traditional Chinese text:

  • Reset Default Preferences: Resets Preferences to the defaults.
  • Show Punctuation: Double-byte punctuation marks and symbols appear in the Candidate window. Not available in ITABC.
  • Convert to Traditional Chinese text: Converts selected text from Simplified Chinese characters to Traditional Chinese characters.

The second section is for settings that effect the behavior of the input methods:

  • Use Two Byte Roman Characters: Double-width Roman characters align with Chinese text. This is a useful property in certain contexts, such as tables and forms.
  • Show Input Keys: Displays input key sequences in parentheses next to the characters in the Candidate window. Helps with learning to use the input methods. Not available in ITABC.
  • Show Associated Words: The idea behind this feature is the same as a standard Chinese dictionary. All the words and phrases in the dictionary are listed under their first character. When Show Associated Words is active, immediately after you enter a character by typing its number in the row (or using the mouse to double-click on it) the Candidate window reappears and displays a list of characters that form words or phrases that begin with the character you entered. Not available in Wubi Xing.

The third section is for input-method utilities and preferences:

  • Find Input Code: See SCIMTool (below).
  • Edit User Dictionary: See SCIMTool (below).
  • Preferences: See Preferences.
  • Generate IM Plug-in: Opens the Input Method Plug-in Converter.
  • Simplified Chinese Input Method Help: Opens the Simplified Chinese section of the Help Viewer (in Chinese).

Preferences

There are four tabs:

  • Typing: Uses a checklist to turn the Show Input Keys and Show Associated Words features on or off.
  • Candidate Window: Allows you to change the direction, font, and size of the text in the Candidate window.
  • Modes: Contains settings for the ITABC and Wubi Hua input methods.
  • Dictionaries: Controls access to user-defined dictionaries in the Show Associated Words feature and the Wubi Xing input method. These dictionaries can be created using the old Simplified Chinese Dictionary Tool utility in OS 9/Classic. Install them in the /System/Library/Components/SCIM.component/Contents/Support/Dictionaries folder and log out. When you log back in, they will appear. Up to four dictionaries can be open at one time.

Input Methods

ITABC

ITABC is licensed from a Chinese developer. See: http://www.znabc.com/

It uses the Pinyin phonetic transliteration system (the letter v is used for ü). You type the input string, then press one of two keys to open the Candidate window. Which key you press makes all the difference:

  • If you press the space bar, you invoke the "ITABC Standard" input method, which remains confined to the GB 2312 character set.
  • If you press the return key, you gain access to all of GB 18030 and also invoke an alternative input method known as Structural Pinyin.

Let's begin with ITABC Standard. Tone numbers are not supported. Instead, the number keys are used for shape input combined with Pinyin input. Type the Pinyin string and a shape-number key if desired (they are optional), then press space and the Candidate window will open. ITABC Standard is designed for word/phrase input (e.g., type "zhongguo"). Use the apostrophe key to distinguish between syllables when necessary (e.g., type "xi'an").

Abbreviated Input

Abbreviated input is always on in ITABC Standard:

  • Type the letter(s) for the Pinyin initials with an apostrophe between them (e.g., "zh'g" for "zhongguo"). Note that often you don't need the apostrophe (e.g., just typing "zhg" will do) or the whole initial (e.g., typing "zg" will also do), but using the apostrophe with the full initial is more precise, and can save you time by reducing the number of choices in the Candidate window.
  • Hold down shift and type the first letter of the Pinyin for each character in the word or phrase in capital letters (e.g., "ZG" for "zhongguo").

You can parse long strings of Pinyin input or abbreviations (there is a 40-character limit on the length of input strings). You will probably find that the easiest way to parse strings is by typing out the Pinyin, because that way ITABC Standard makes much more accurate guesses. Use return to enter selections when parsing strings.

After you have parsed a string and entered it into text, ITABC Standard will remember it. The string will appear when you type it in any form, abbreviated or whole. Parsing the string 老东西 "laodongxi" (old thing) is a good example. Try it. When you first type the input string and hit space to call up the Candidate window, it incorrectly parses the first two characters as 劳动 "laodong" (labor). Use delete to back up over "dong" and change "lao" to 老 (old). Press return and it correctly parses 东西 "dongxi" (thing). Press return again to enter the phrase into your document. Thereafter, typing "LDX" or "ldx" will retrieve it, as will "laodongxi."

Stroke-Shape Input

Stroke-shape [笔形] input is always on in ITABC Standard. It uses the number keys (1-8) to indicate the shapes of the strokes that make up a character:

Stroke-shape input

By default, in ITABC shape input is used in combination with Pinyin. The chart above is not comprehensive, but it should give you a good sense of how this works. Shape numbers 3, 4, 5, and 6 each cover a set of related forms. Shape numbers 7 and 8 are actually combinations of strokes, and they take precedence over the individual strokes (thus, 7, not 1-2, and 8, not 2-5-1). For example, 苹 is "ping72" — typing "ping7" yields four choices, "ping72" narrows it down to two.

ITABC shortcuts

  • To directly enter Chinese numbers, type small "i" and then the desired number key.
  • To directly enter formal Chinese numbers, type capital "I" and then the desired number key.
  • To enter a word or phrase from the User Dictionary into documents, type small "u" (for "user") and then the input key sequence you defined for it.
  • Type small "v" with a number for access to special Selection windows. Note that these palettes contain double-byte characters, intended for use in double-byte text. They are used mainly in tables, charts, lists, and such, when you would like to maintain proper spacing among Chinese characters: v1=punctuation, symbols; v2=numbers; v3=basic Latin alphabet and numbers, punctuation, symbols; v4=hiragana; v5=katakana; v6=Greek alphabet, vertical-text punctuation; v7=Cyrillic alphabet; v8=Pinyin vowels with tones; v9=more punctuation, symbols.

ITABC Preferences

The ITABC section of the Modes tab in Preferences has four parts:

  1. Input Mode: Toggles between the ITABC "Standard" mode, discussed in detail above, and the ITABC "Double" mode. ITABC Double uses a combination of standard and "simplified" Pinyin, as well as abbreviated input and stroke input, as follows:
    • Initials: ch=e; sh=v; zh=a
    • Finals (some keys produce different things depending on the initial): ai=l; an=j; ang=h; ao=k; ei=q; en=f; eng=g; ia=d; ian=w; iang=t; iao=z; ie=x; in=c; ing=y; iong=s; ong=s; ou=b; ua=d; uan=p; uang=t; ui=r
  2. Pure Shape Input: Allows you to use shape input by itself, using the number keys and the numeric keypad. You can enter up to six shapes per character. For characters that require six shapes or less, you should use standard written stroke order. Remember, shape numbers 7 and 8 are combinations of strokes, but they only count as one. See the chart above. For more complex characters, you must divide them into two parts (either right-left, upper-lower, or inner-outer), with three shapes for each part.
    • When Pure Shape Input is active, the Candidate window is dynamic. The numbers enclosed in circles indicate the next stroke to be typed (use the arrow keys or the mouse to navigate in the window).
  3. Adjust Word Frequency: Causes the characters you use most frequently to be listed first in the Candidate window. Otherwise, characters are listed according to standard GB order (the most common are listed first).
  4. Information Window: Turns one of the most prominent features of ITABC on and off. When active, it appears beneath the Candidate window. It can always be called up by pausing the mouse over a character in the Candidate window (click on it to make it disappear). It displays the GB and Unicode code points, the pure shape/stroke input code, the Pinyin pronunciation(s), and, last, a new category, called 拆白 Chaibai:

Chaibai

Structural Pinyin

Now we can turn to using the return key to call up the Candidate window in OS X 10.4. Pinyin + shape input works for individual characters (not words and phrases) for the entire GB 18030 character set, a total of 27,496 characters. In addition, the return key invokes the Jiegou Pinyin [结构拼音] input method, which has been translated as "Structural Pinyin." Standard Pinyin readings are given for the graphic and/or phonetic components of the structure of the character, usually left-right, top-bottom, inner-outer. These are given in the Chaibai ("components") category of the Information window. The character selected above breaks down into two common graphic/phonetic components: 魚 yu and 日 ri [it is actually yue 曰, but all component 曰 are read as 日 in this input mode]. If you type "yuri" and press return, you get the Candidate window shown above. The little green ! triangle warning signs indicate characters not in the GB 2312 character set. The purpose of this is to use Pinyin to input obscure characters that one doesn't know how to pronounce. There is a bonus in it for non-native users, since it can also be applied to more common characters!

Troubleshooting:

There are three files in /Users/~/Library/Preferences/ that store data for ITABC:

  • user.rem: Among other things, the user's word-usage frequency data is stored here.
  • tmmr.rem: When you add a word to ITABC Standard, it is stored here.
  • itabcx.rem: This is for the return-key input method.

You can trash any of these files and new ones will appear when you next use ITABC.

Wubi Xing

Wubi Xing uses 25 radicals, each assigned to a key on the keyboard. See Joe Wicentowski's Wubizixing tutorial.

Here are a few tips for using the Apple input method:

  • To look up the correct key sequence for a particular character, use Find Input Code.
  • To display the input key sequences for each character, use Show Input Keys.
  • One character can be input directly at each key.
  • You do not always have to type the entire key sequence.
  • You can input compounds, like "buje" for congming (intelligent).

Wubi Hua

Wubi Hua uses 5 strokes, each assigned to a key (1-5) on the numeric keypad. See Joe Wicentowski's Wubi Hua tutorial. In the Apple input method, the 0 key is used to indicate a character has required fewer than five strokes to enter, and the 6 key is a wildcard.

The Wubi Hua section of the Modes tab in Preferences has two parts:

  1. Define Input Keys: Allows you to assign letters to each stroke. To change the assignments, click on them and then press the new key (a to z) on the keyboard.
  2. Sorting: Toggles between "GB2312" order, where the most common characters appear first, and "Frequency" order, where the characters you use most often appear first.

SCIMTool

Find Input Code

You can enter up to two characters into the Find Input Code window:

Find Input Code

Edit User Dictionary

The User Dictionary data is stored in document named "user.rem" in your home /Users/~/Library/Preferences folder. It is designed to enable you to create your own custom abbreviations or shorthand translations:

User Dictionary

To enter a word or phrase from the User Dictionary into documents, type small "u" (for "user") and then the input key sequence you defined for it.