East Asian Languages and Chinese Characters

8. Appropriateness to East Asian Languages

The best arguments for Chinese characters revolve around what many see as their "appropriateness" to Chinese language and by extension to the Sinitic vocabularies of other East Asian languages. Chinese itself, with its alleged "monosyllabic" structure, is regarded as uniquely suited to a form of representation whose units are one syllable long. Since the serviceability of a writing system is measured by how well it fits the language, what more could be asked? Also, by focusing on meaningful units, the characters are said to eliminate a major deficit in the Sinitic parts of East Asian languages, namely, their poorly differentiated phonetic structures. Because of its many homonyms, Chinese vocabulary -- by this argument -- cannot be reliably distinguished through speech or through a phonetic writing system based on speech. But since Chinese characters "transcend" speech, users distinguish by sight words that cannot be distinguished by sound. Finally, this same supra linguistic quality allegedly enables characters to bridge the differences between China's many "dialects," enabling people all over China to read the "same language." The conclusion drawn from these arguments is that what counts is not the writing system per se, but how well that system matches the concrete reality of the language, in which case Chinese characters are said to score high.

The above can be called the "enlightened" view of Chinese writing, held by many linguists, East Asian and Western, who have taken the trouble to analyze the character writing system in terms of what it is asked to accomplish.1 Unfortunately, these arguments, while valid on one level, share the same basic flaw of confusing the remedy for a problem with its cause. In the first place, I shall argue below that Chinese is not "monosyllabic," perhaps even less so than English. Multisyllable words are the norm in Chinese, and the only reason it appears otherwise is the morphosyllabic writing system, which enforces an artificial analysis of a word's constituents while masking or preventing the emergence of phonetic interaction across syllable boundaries. Similarly, claiming that Chinese characters are useful because they distinguish homonyms is, quite simply, putting the cart before the horse. Homonyms are a problem in Chinese and Chinese-based vocabulary because the characters let people coin words that cannot stand on their own phonetically or that are not words at all, but written abbreviations of words. Lacking any incentive to write the full representation of a word that can be understood visually through some fraction of its components, Chinese writers over time evolved a set of conventions that worked for the written medium but ignored the conflicting requirements of speech. Phonetic ambiguity was the result.

By the same token, the "unity" that Chinese characters allegedly impart to the language by allowing speakers of different " dialects" to read a common written language turns out to be an illusion. These so-called Chinese dialects have less in common than the Romance languages of Europe, meaning that speakers of nonstandard Chinese (some 30 percent of the Han population) are not reading their own language or even a common language, but what is to them a Mandarin-based second language written in Chinese characters. Granted the characters allow non-Mandarin speakers to read segments of written Mandarin in their own regional pronunciations. But, far from unifying Chinese, this practice only perpetuates differences that would have been leveled out long ago under the influence of a phonetic script. Again, the cause of a problem is mistaken for its cure. I shall argue in this chapter that the "appropriateness" of Chinese characters to Chinese is solely a function of the effects this writing system has had on the language. Or, put another way, the only good thing to be said for the characters from a linguistic point of view is that they "solve" certain problems that their own use has created.

"Monosyllabic" Chinese

There is a popular notion that the words of Chinese are made up of single-syllable units. This belief owes its currency to three factors: (1) The classical style of writing, which still predominated earlier in this century when western scholars first became interested in Chinese, was until recently given more weight in the training of China specialists than the colloquial language itself. In classical Chinese (a written language that has no spoken counterpart), a one-syllable-one-word paradigm really was approximated. (2) Chinese dictionaries are for the most part still arranged by characters, leading users to assume that these single-syllable graphic forms correspond to what one normally finds in dictionaries, namely, words. (3) There is a lay misconception that if characters are more than letters and have meaning, then they must represent words, and that these "words" are all one syllable long. Noting that Mandarin has fewer than 1,300 distinct syllables, various authors have gone on to associate these two "facts" about the language and have concluded erroneously that Chinese have restricted vocabularies, cannot understand each other in speech, and have trouble with abstractions (Gleitman and Rozin 1973b:497; Bloom 1981; Logan 1986; Tezuka 1987).

Thus the allegation that Chinese is monosyllabic is based not on the language as it is spoken (and, presumably, internalized by its speakers), but rather on the way the language was and is conventionally written. By identifying the syllable-sized units of written Chinese with words instead of with morphemes, people began to believe mistakenly that the language itself is monosyllabic. According to Zhou, monosyllabic words account for just 12 percent of the contemporary Chinese lexicon (1987b:13). DeFrancis reckons about 5 percent of the two hundred thousand words in a modern dictionary are monosyllabic (1984a:187). These figures apply to the lexicon as a whole. For running text, DeFrancis estimates Chinese ''as only 30 percent monosyllabic as against 50 percent for English material written in a style comparable to that of the Chinese" (1943:235). Zheng gives a higher figure of 40 percent monosyllabicity for Chinese texts (1957:50), while I find English text nearly 60 percent monosyllabic. Clearly, the notion that Chinese, absolutely or even relative to other languages, is made up of monosyllabic words is untenable.

In his book The Chinese Language: Fact and Fantasy, John DeFrancis devotes a chapter to exposing what he properly calls "the monosyllabic myth," which some scholars have mistakenly applied to Chinese and to Sinitic words in other Asian languages. Although the concept is no longer defensible, the term "monosyllabic" is susceptible to another interpretation that is more consistent with the facts. Looking not at words but at the morphemes of Chinese, we find that they do by and large correspond to single syllables, and in this special, restricted sense the language can be considered more or less monosyllabic (Hockett 1951:44; Li Fang-kuei 1973:2; French 1976:103; Ohara 1989:85). Sinitic words are not monosyllabic, but the fact that most of their morphemes are has had an important impact on the formation of vocabulary.

No language can get by today with only a few thousand monosyllabic words. However, if each of the monosyllabic morphemes of a language has its own unique graphic sign that shields the morphemes (in some cases artificially) from attrition and draws attention to their existence as units, then there is no need for words to exceed two syllables in length, since, mathematically, the format can accommodate millions of word-length expressions. By focusing attention on the morpheme and making possible the preservation of a one-syllable-one-morpheme relationship, Chinese characters enabled the language to evolve in such a way that its concepts can be and usually are expressed in one- and especially two-syllable words. There was no need for a more complex morphology to come into play, since such words find their natural application in writing or in the discourse of groups sensitive to a particular context.

Not only do Chinese characters make possible a lexicon of one- and two-syllable words, they strongly inhibit the formation of words that exceed this length. This is not to deny the existence of multisyllable words entirely. The process of compounding has its own dynamic that involves more than the need to create structural distinctions. Two-syllable words are expanded and further defined by morphologically productive affixes,2 or they become fused into longer expressions as aphorisms or compounds. Yet despite what would seem like natural causes for their development, multisyllable terms are still relatively scarce. According to Chen Mingyuan, words with three or more syllables account for just 2 percent of the text in contemporary Chinese writings, whether the subject is science and technology or everyday topics (1980:69). Hai Ying gives a figure of 3 percent (1980:150). A similar impression is gained by inspecting the regular columns of words in Chinese character dictionaries and even in hangul dictionaries of Korean, where the progression of two-syllable words is only occasionally interrupted by longer entries. How can this be explained?

One reason may be the Chinese propensity for symmetry and balance. But this phenomenon could as easily have resulted from the influence of the language's morphology and syntax on behavior. Another factor is visual redundancy. There is already a great surplus of graphic information in a written two-character expression, so why use more than necessary? A third explanation invokes principles of semantics. Given the autonomy of thousands of single-syllable, meaning-bearing elements that the use of Chinese characters has made possible, a combination of two such units is the most natural semantic configuration, encompassing both the root-modifier format and the fusion of complementary or antithetical concepts. Extending these basic patterns by the addition of a third or fourth morpheme has more to do with the requirements of syntax than semantics.

Incredibly, another reason for the ubiquitousness of the two-syllable format may be a shortage in the modern language of genuine one-syllable words! I have argued that the number of syllables needed for high-level vocabulary in Chinese is fewer than in European languages because the syllables are given an additional (and from a strictly phonetic point of view artificial) level of redundancy through the character script. This redundancy, however, applies only to the language as it is written, which may be the usual habitat for that segment of the lexicon but is hardly so for the bulk of everyday concepts that must be communicated verbally. And as differentiated as the written forms of Chinese syllable-morphemes are, the phonetic qualities that separate them are few indeed. Statistics compiled by Gao and Yin show 1,280 spoken syllables for standard Mandarin compared to 4,030 for English (1983:70). Equally important, this difference arises not because of a relative shortage of phonemes, but from restrictions on the use of these phonemes within the syllable (there are, for example, no consonant clusters and only three consonant endings), which makes the Mandarin syllable appear even less differentiated. Because there are fewer phonetic distinctions within the syllable, basic concepts, which are the logical candidates for single-syllable expressions, are also represented by compounded two-syllable words to a surprising degree, just to insure phonetic intelligibility.

What conclusions can be drawn from the foregoing? Most basically, that Chinese language is not monosyllabic, and hence the argument that single-syllable graphic units are its most appropriate form of representation is wide of the mark. Absurd as it sounds, it would be far easier as things stand now to argue for a writing system that uses bisyllabic units. What is monosyllabic about Chinese is its morphology, but this can be directly attributed to the effect Chinese characters have had on the structure of morphemes. Claiming for this reason that characters are more suitable than a phonetic script to write the language is equivalent to praising heroin because it "happens" to satisfy a user's addiction. If there were no need to ascribe meaning to every syllable, a polysyllabic morphology would have emerged long ago.

Morphemes versus Words

One need not subscribe to the thesis presented here -- that the Chinese writing system, more than any "inherent" typological factor, is responsible for the language's monosyllabic morphology -- to appreciate that Chinese look at their language not in terms of words at all, but in terms of morphemes. This apparently innocuous difference has had profound effects on the structure of the Sinitic lexicon and, as we will see in later chapters, on the ability of East Asians to mechanize writing and make other adjustments required by modern times. What is involved here is an entirely different mindset. One need only consider how few Westerners know the term "morpheme," which has no direct relationship to their alphabetic writing systems, to appreciate the fact that until recently Chinese did not even have a word for "word." Just how poorly this latter concept is held is evidenced in the habitual use by Chinese -- including some with doctorates in linguistics -- of zì (written character) for cí (word), even in referring to units of the spoken language. Since the focus of standard Sinitic (although not the nonstandard Chinese "dialects") is clearly more on morphemes than on words, Chinese characters, which represent morphemes, are regarded by many as the most appropriate way to write the language.

In fairness, it must be acknowledged that "word" has been one of the trickiest terms for linguists working with any language to define. Structural linguistics, with its outside-in view of language, has failed to provide any commonly accepted definition of the term, which surprises most people who feel intuitively when they use the term "word " that they and their listeners know what it means. Linguists, with some embarrassment, have ended up accepting a definition of word that is anathema to this speech-oriented discipline, namely, that a "word" is something one finds written between two blank spaces. But this empirical observation makes a lot of conceptual sense. What at any given time is a word in a language is not something linguists can ascertain on the basis of phonological characteristics alone, but is rather a social convention that must be made or discovered. This discovery process is precisely what writing systems that have word division force on literate users of the language. Words have to be "coined," that is, willfully manufactured and then ratified through a concrete mechanism that shows that the neologisms enjoy widespread acceptance. Word division in writing provides this mechanism.

However, this is only part of the story. Spoken languages, like any open-ended system, are constantly changing as different speakers seek to adapt their linguistic habits to a dynamic physical and psychological environment. This is as it should be. Concepts serviceable today eventually lose their relevance or validity, and it makes no sense at all to pretend that linguistic conventions once agreed on can or even should continue in perpetuity. Obviously, they do not, or I would be speaking some form of proto Indo-European, and my southern and northern Chinese colleagues would understand each other. However, no language is worth much (or even imaginable) if its conventions -- including what it recognizes as concepts -- are not shared by a wide body of users long enough for them to act on these shared assumptions and create a culture in which to live. Some balance must be reached between linguistic growth and conceptual chaos. Although any conventional writing system will help formalize a language, only those systems that incorporate word division can exercise a stabilizing effect on the flux between what different speakers of the language at different times regard as its finished concepts.

Morphemes, by contrast, are relatively easy to define: they are the smallest meaningful units of sound. But they are not sufficiently distinct in meaning or stable, and they cannot stand by themselves in transmitting information (Xie Kai 1989:17). Users still have to combine morphemes into words, and although this process of word formation occurs in Chinese as in any language, there are important differences. On the one hand, because Sinitic morphemes are identified by their own unique signs, they tend to remain "morphemes" longer than they should. In non-Sinitic lexicons, when two or more morphemes combine to form a word, the rationale for selecting the particular morphemes can often be inferred later from the meaning of the word and what users know about how the particular sounds relate to the meanings of other words. This is especially true if the language is written in an alphabetic system where spelling tends to be conservative. Eventually, however, the original motivation is lost to all but a small body of professional etymologists, the remaining users having better things to do with their time and language than to contemplate why a word means what it does. Character-literate East Asians, for their part, are denied this luxury; on some level they are forced by the nature of their writing system to associate meaning with every syllable long after semantic change has erased the original connection-assuming the connection was logical to begin with -- and to this extent fail to grasp the totality of the new concept.

On the other hand, the absence of word division in Chinese writing, the need for which is obviated on the textual level by the fact that the characters are already providing a semantic analysis of the discourse, means there is no reinforcement of or check on what users do regard as words. This phenomenon is usually presented in positive terms by proponents of Chinese characters as "word-building power," whereby one can combine Chinese "characters" (morphemes) into an unlimited number of new concepts. It also lets some Chinese believe that one need master only a few thousand characters to grasp the whole of the language, unlike foreigners who must learn tens of thousands of units.3 The problem with this morpheme-dominant practice of word formation is that "words" are produced that are not words at all, in the usual sense of rating an entry in a dictionary or even being known to a significant minority of users. Writers assume that if they choose appropriate characters, readers will probably get the idea, more or less, of what they intend.

Not surprisingly, these same habits are reflected in the composition of dictionaries. Students of alphabetically written languages can generally expect to open a dictionary and find unknown words that they encounter in speech or writing. However, fantastic as this may seem, the student of an East Asian language (including Vietnamese, which has not shaken its Chinese-style fixation on morphemes) beyond a certain level can usually count on the unknown combination not being in a dictionary, neither a bilingual dictionary nor one in the target language. The situation is so perverse that I sometimes feel guilty when I do find a combination I am looking for. More often than not, if the word is there at all, it is only because it was coined as a translation of a borrowed Western concept. Usually I end up doing what most East Asians do, and piece together the meanings of the two morphemes for a general idea of what is meant and try to convince myself that I understand it even if I do not.

If words are a language's finished concepts, it is difficult to see how anything that subverts the role of words could be beneficial to a language and its users. Yet, as we have seen, Chinese writing does this in two ways: by encouraging users to focus on a word's parts instead of on the whole and by allowing people unlimited license to make up "words" with no social sanction. The result is a collection of relatively amorphous units (morphemes) that dominate the written language and to a great extent the psychology of its users, and a reduced role for actual words in the language. Again, one can claim for this reason that the characters are more "appropriate" to the language in its present state, although the declaration seems rather vacuous.

The Homonym Problem

One of the most commonly cited -- and misunderstood -- justifications for Chinese characters is that they "eliminate" the so-called homonym problem in Chinese and the Sinitic lexicon in general. The thesis runs as follows: Chinese and Chinese-based vocabulary, more than that of other languages, include many words that sound the same. Not only are the number of syllable types in Chinese and in the Sinitic parts of Japanese and Korean few, the "monosyllabic" structure of these languages makes it inevitable that the same sounds and sound combinations will carry an unusually high number of meanings that cannot be reliably distinguished by phonological features (written or spoken). Fortunately, Chinese characters, being tied to meaning, are available to disambiguate this phonetic homogeneity. Words that sound alike at least do not look alike, meaning that East Asian languages, thanks to this "visually oriented" writing, are free to acquire vocabulary despite their phonetic handicap. Once again, Chinese characters save the day.

Plausible as this argument sounds, the statistics and rationale behind it as it applies to Chinese are spurious, and I include it here only because it is raised so often in the procharacter literature by East Asians who do not distinguish morphemes from words, and by nonspecialists in the West who accept their arguments at face value. The usual ploy is to consult the index of a large character dictionary, note the number of single-character entries under a given syllable -- which can be in the dozens -- and assert that the languages obviously need to be written with Chinese characters because phonetic representation would make the meanings of these sounds indistinguishable. However, as we have already noted, the number of single-syllable words in Chinese is less than in many alphabetically written languages. Even for sounds like Chinese yì and shì, where the inventory of characters is especially large, single-syllable morphemes that can stand alone as words are few. Almost all of these entries are bound or semibound morphemes that do not appear as isolated units in the spoken language.

What must be counted if statistics are to be meaningful are homophonous words. Using pinyinized Chinese, that is, Chinese written in a style appropriate to the phonetic writing system where the units are or should be words instead of syllable-size morphemes, WenWu found 11.6 percent of Chinese words to have homonyms, compared to 3.1 percent for English (1980:120). Zhou reports that in a Chinese dictionary of 60,000 words, some 4,000 or about 7 percent of its entries have homonyms; for a 120,000 word dictionary, the homonyms increase to about 6,000 or 5 percent (1987:13). Although high by Western standards, the figures are hardly alarming, since nothing has been said yet about frequency, the effects of context, or the phenomenon of "related meanings" in alphabetically written languages, which skews the comparison. In practical terms, Zhou calculates that the homonym problem in modern standard Mandarin reduces to about 1 percent. In an earlier study, Chen Wenbin counted 2,196 homophonous Chinese words from a corpus of 30,000.4 Of that number, only 82 (39 sets of) polysyllabic words and 164 (70 sets of) monosyllabic words required differentiation.

These figures are a far cry from the impression one gets hearing about thirty-nine different Chinese "words" pronounced shì, forty-nine pronounced yì, and so forth. Another factor that makes the homonym "problem " in Chinese seem worse than it actually is relates to the etymology of homonyms in general and the impossibility of distinguishing them from their close cousins: polysemantic words. According to Sampson, the distinction "is essentially a historical one: when a given phonological shape is used for more than one meaning we say that we have distinct homophonous words if we know that at earlier stages the words were entirely separate, but we have a single polysemous word if the various meanings can be shown to have developed out of one original sense" (1985:155). Although polysemy exists in Chinese, particularly among its monosyllabic words, the incidence among polysyllabic Chinese words is lower than in Western languages because of restraints imposed by the character writing system. There is a limit to the meaning that can be logically imputed to the sum of two or more character-designated morphemes. Moreover, as meanings drift through time, Chinese tend to assign (or fashion) new characters for the changed sense, which technically yields "homophony" instead of polysemy. It seems likely that if all the meanings of polysemantic words in English or other alphabetic languages were counted and added to the number of words that pass as homonyms in those languages, the total would approximate the number of "homonyms" in Chinese; it would at least make the problem seem less formidable.

These points are raised to demonstrate that the so-called Chinese homonym problem involves much more than counting homographic dictionary entries and making cross-language comparisons on that basis. One can even question the assumption that homophony itself is bad. As Shi Xiaoren (1983:58) and Ao Xiaoping (1984:21) point out, there is nothing intrinsically wrong with the phenomenon. It is an economy measure common to all languages, and it would not happen if people did not feel that using longer units or a greater number of phonemes was more difficult than sharing meanings over a smaller number of representations. The question is how much homophony is desirable, a certain amount of it evidently being indispensable. I suspect that what lies at the bottom of the incessant carping about how Chinese, because of its "homonym problem," could not be understood if written phonetically is a deep-seated realization that if the characters did disappear, users would be forced to adjust to a new and unwanted regimen. They would have to use words that are words and abandon the undisciplined, self-indulgent practice of creating them arbitrarily.

I am more sympathetic to analogous claims about phonetic ambiguity in the Sinitic parts of Japanese and Korean, which can be attributed to special circumstances surrounding their adaptation. For nearly two millennia non-Chinese languages on China's periphery have shared Sinitic vocabulary) freely, in a manner known to all of the world's languages. Until recently, the direction of this "borrowing" had been largely from Chinese to Japanese Korean, and Vietnamese, although the latter languages -- most notably Japanese -- have reversed the process and for the last century and a half have been coining new terms from Sinitic morphemes that are adopted by all four languages.5 As a result of this borrowing, more than 40 percent of Japanese. 50 percent of Korean, and at least one-third of the words in Vietnamese art based on Sinitic morphemes, according to Liu (1969:67). These figures apply to everyday vocabulary and are lower than other researchers' counts that take in a wider corpus. For example, Sokolov claims 60 percent for Japanese, with the range for actual use varying between 10 and 80 percent, depending on the topic (1970:98). Ho Ung claims 60 percent (1974:44), and Oh claims 90 percent for some types of Korean materials (1971:26). Helmut Martin notes that in formal Vietnamese the ratio of Sinitic words can reach 50 percent; for newspapers it goes much higher (1982:32).

In general, the share of Chinese-style words in these non-Chinese languages increases with formality and difficulty of content, which is to say, Sinitic terms dominate those environments where style and subject matter make them the least predictable. One would think that the emphasis would be on maintaining phonetic distinctions between these word forms, but the opposite is more nearly true. Since most of the terms refer to higher-level concepts, the expectation was they would be identified through writing, where phonetic characteristics matter less. Accordingly, there was less pressure to avoid homonyms and near homonyms. Another, more important reason for the homophony can be traced to the dynamics of borrowing. When a language "borrows" terms from another, it typically adapts the words' sounds to its own phonology, which is never a perfect match. The borrowing language cannot add distinctions to the sounds of the terms it is borrowing, but it can and does ignore phonological distinctions that its own system is not equipped to handle. In the case of international Sinitic, this means dropping the tonal features that help distinguish one Chinese syllable from another.6

Just what this meant for the Sinitic vocabulary of Korean and Japanese is evident in the following figures. From an inventory of thirty-six initial and six syllable-final consonants totaling 3,877 different syllable types in sixth century A.D. Chinese, the number of syllables in modern standard Mandarin fell to 1,280, distinguished by twenty-two initial consonants, two final consonants (three, including the Beijing dialect's -r), and four phonemic tones. Korean speakers, for their part, have 1,096 syllables at their disposal (Yi Kang-ro 1969:44), which increases to 1,724 if we count written syllable types, hundreds more than in Mandarin even with the tones. This inventory seems to give Korean an advantage, until we realize that only four hundred or so different syllables are used for Sino-Korean. If this were not bad enough, most of this vocabulary is expressed in Korean as two-syllable compounds, even more than in Chinese, because of the availability of indigenous single- and multi-syllable words to handle the day-to-day concepts. The result is significantly more homonyms. Nam counted 22,983 Sinitic homonyms and 4,077 of mixed origin among the 91,825 entries in the Hangul Society's Kukŏ sajŏn (Korean Language Dictionary) (1970:11). Pure-Korean homonyms numbered only 3,120.

For Japanese the situation is even worse. Not only were Chinese tonal categories leveled, the phonetic reduction that occurred when these words were borrowed and their subsequent erosion through time have left just 319 sounds (on readings, including bisyllabic morphemes ending in tsu, chi, ku, and ki) for the 4,775 character-morphemes listed in Nelson's dictionary. Even this figure understates the problem, because many of these sounds have one character only, while others accommodate more than one hundred. Samuel Martin noted that the Japanese syllable kō corresponds to "at least 38 different (Chinese) syllables, some of which already represented more than one morpheme in classical Chinese" (1972:99). More than 180 characters are identified with this sound alone. Even with compounding the numbers are still formidable. Korchagina counted twenty-four words pronounced kōkō, twenty-three pronounced kōshō, eighteen kōtō, and fourteen kōchō in a modern Japanese-Russian dictionary (1977:43), adding that "the allegation of certain linguists that homonyms are an imaginary problem that exists only for linguists can hardly be applied to the Japanese language" (1975:52).

Other sources of homonyms are attenuated classical expressions in the modern colloquial language and extensive abbreviation -- a practice that Zhou called the "monosyllabification of polysyllabic words" (1961:300). These abbreviations appear in technical terms and other types of new vocabulary that are shortened for convenience after the concepts take root in society, in names for organizations and institutions where the first or most significant characters for each word in the name are singled out to represent the whole, and, especially in Chinese, in the use of pithy, shortened slogans generally of a political nature. Although abbreviations make sense from the point of view of the reader, who, thanks to the characters, is inundated with a surplus of graphic information, the same morphemes that make up these abbreviations lose most of their redundancy, both absolutely and with respect to other expressions in the language, when spoken aloud. What began as graphically and phonetically distinct words collapse into homonyms or near homonyms ("paronyms") as reductions are made based on the requirements of writing that have no direct connection with the information-bearing requirements of speech.

This brings us to the heart of the problem. If a word's intelligibility is a function of its distinctiveness and predictability, then Sinitic vocabulary, because of the way it is formed and expressed, falls short in both respects, transforming what began simply as an abundance of homonyms into a genuine homonym "problem." With respect to distinctiveness, historical factors, the mechanism of borrowing, and most important, the use of a writing system in which graphic redundancy does not translate into anything remotely equivalent in speech have created an enormous number of terms with the same "external" phonetic characteristics or, what is just as bad, terms that differ in sound only minimally, by squeezing half or more of the languages' words into some 10 percent of the phonetic forms available to represent them. Homonyms are only the most noticeable effect of a phenomenon endemic to the Sinitic corpus as a whole, that is, its lack of phonetic distinctiveness overall.

The other factor -- predictability -- scarcely fares better. Goodman has shown that readers' ability to predict words from context can be as important for understanding as what actually appears in print (1976b). Cryptanalysis throughout much of its history was based on this same principle: that context severely constrains what can or cannot appear at any given point in a discourse and still make sense. If a printed form has a dozen or more meanings (or is missing from the text entirely), readers can often figure out what is intended on the basis of expectations induced by the surrounding text. In Chinese and Chinese-style writing, however, certain factors work against this. Since Sinitic terms are able to function in different grammatical environments without overt changes to their form, readers are less able to use this feature to predict what types of words can appear (Korchagina 1975:48; Yi Ul-hwan 1977:65). Guesswork is further constrained by a shortage of what can be called "serial redundancy." By comparison with alphabetic writing, Chinese character texts focus a disproportionate amount of their informational cues on individual graphemes, making it possible (or, from the standpoint of aesthetics, necessary) for writers to cut back the number of units introduced in the whole text, classical Chinese and modern newspapers being extreme examples. The result is that the information value of each remaining unit rises and the units become less predictable.

If Sinitic vocabulary lacks distinctiveness and suffers more than comparable terms in Western languages from shortage of context, what of the remaining determinant of a word's predictability, its familiarity to users? Here is the major cause of the problem that passes, with only partial justification, as the result of a surplus of homonyms. Readers of all-hangul Korean texts, for example, who because of the absence of Chinese characters are forced to rely entirely on phonetic information and context, are not encumbered so much by homophony per se (i.e., confusing one word with another) as they are by the inability to identify any meaning at all for the string of symbols given. In some cases this phenomenon can be dismissed as insufficient exposure to the word in phonetic form, whether spoken (where the vocabulary appears less frequently) or in texts, where it normally appears in characters. In this case, the user knows the word but is not used to its phonetic representation. Elsewhere, the sequence may not be a word at all, in the usual sense of being known to a majority or even a significant minority of educated users.

One can argue that none of this matters as long as the representation is in Chinese characters -- but that is my whole point. Homonyms, near homonyms, and the shortage of grammatical and stylistic conventions for distinguishing them in the beginning had nothing to do with the features of the languages themselves and everything to do with the way these languages came to be written. As I have pointed out, the ability of characters to designate most concepts without reference to sound7 has enabled the morphemes that they represent to be combined into words on the basis of their semantic values alone. There was no need to take phonetic intelligibility into account when the expectation was that discrimination would be accomplished through Chinese characters. According to Sokolov, "In creating Chinese or Chinese-style words little or no consideration was given to the need for distinguishing the words by sound. " Rather, they were formed with the tacit understanding that their use would be restricted primarily to the written medium. The characters allowed phonetically deficient words to come into the language, and as long as these terms exist, there will be a need for characters (1970:97-98).

Neverov points to the high combinatory potential of Sinitic morphemes, which facilitated word formation and made this portion of the lexicon the first choice for a quick solution to the problem of introducing Western concepts. In forming these words, attention was paid only to the accuracy of the result; pronunciation played no role at all (1977:240). Korchagina's argument -- that because characters can be used without ambiguity, the usual pressures leading to homonym discrimination do not come into play -- comes closest to the present thesis. "In this way, the characters themselves ought to be regarded as the indirect source of homonyms in the Japanese language" (1977:44).

How the source of a problem can be regarded by supporters of the character script as that problem's solution escapes all logic. Rather than praising Chinese characters for their "appropriateness" to East Asian languages, it would be better to blame them for what they have done.

Transitivity across Languages

Next to homonym discrimination, the advantage most commonly claimed for Chinese writing is its supranational, supradialectal function, which allegedly enables speakers of different East Asian languages and "dialects to communicate without knowing each other's speech. According to this argument, character-literate Chinese, Japanese, and Koreans can read materials written in any of the three languages by virtue of the characters' functional independence from sound. Although the symbols may be pronounced differently, they mean the same thing to any East Asian who has learned the system, it is claimed. Members of this "Chinese character cultural sphere" are thus better equipped than users of "sound-based" alphabetic systems in the West to exchange information and cope with the demands of today's international society. What is true of countries within East Asia, by this argument, also holds true within China for the same reason. Chinese characters, being tied to meaning more than to sound, are said to transcend "dialectal" variation inside China, thereby "unifying" the language and its speakers. Finally, literate Chinese, because of the ability of characters to mask differences in sound, are also said to be able to read Chinese written millennia ago based on what they know of the language today. In sum, what seems like a complicated and cumbersome system on one level is believed by some to make sense from a broader perspective.

I will try to show that these claims for the most part are fanciful fabrications, and that most of the success that the characters have in bridging different languages and "dialects " is also achieved with alphabetic writing. Let us begin with the former assertion: that Chinese characters allow literate users of Chinese, Japanese, and Korean to read each other's languages. It is tempting, though poor scholarship, to dismiss this claim up front by pointing out that if such were the case, there would be no need for governments to maintain separate pools of Chinese, Japanese, and Korean translators at enormous expense or to separately recruit specialists whose function is to read newspapers and technical works in these languages. Similarly, I and many of my colleagues in academe whose interests lie primarily in one of these three languages could happily have saved the years of effort it took to acquire a reading knowledge of the others. Although isolated words and segments of character text sometimes achieve the cross-language transitivity claimed for the system as a whole (such as occurs with the "international" vocabulary shared by alphabetically written European languages), anyone who has taken the trouble to learn more than one of these East Asian languages will find the notion of literacy in one equating to literacy in another simply laughable.

More than any actual performance factor, what gives credence to this claim, I suspect, is the tendency of Westerners to lump whatever differs from their own culture into a common bin, abetted by certain East Asians' naive or willful assertion that characters are characters, and what can be understood in China can be understood everywhere else in East Asia. In fact, nothing could be further from the truth. As described in Chapter 4 of this book, Vietnam long ago left the "Chinese character cultural sphere" and is using an alphabetic script. Chinese characters today have the same status in Vietnam as they have in the United States, namely, as decorative items and as a script for the country's Chinese-speaking minority. They have no present role in the language or in the linguistic psychology of its users. This fact is bemoaned by advocates of the character script in other Asian countries, but it is not something I have ever witnessed the Vietnamese themselves to be concerned about. For someone long inured to the vagaries and outright nuisances that accompany the use of Chinese characters, it is almost surrealistic to observe people of the same Confucian culture going about their lives using their language instead of being absorbed by it.

But there it is nonetheless: an East Asian society rebounding from decades of colonial rule, war, and socialist economics, blissfully unaware of its "benighted" status in the eyes of East Asian traditionalists. Vietnamese is able to borrow the international Sinitic terms coined elsewhere in East Asia just as alphabetically written Western languages share new vocabulary with each other. But the similarities between Vietnamese and character-based East Asian languages stop there. Words are spelled in Vietnamese, not drawn. Dictionaries, personal names, book titles, company listings, products, and geographical locations are cataloged in alphabetical order and are immediately accessible to any literate speaker. Text is composed on a computer screen directly; there is no dancing between an intermediate form of representation and units that mayor may not correspond to what one actually wants to write. If Vietnamese are suffering through their non-use of Chinese characters from cultural deprivation or any linguistic maladies occasioned by an alleged breakdown in "transitivity," someone had better tell them. It won't be me.

What of the other areas of East Asia where Chinese characters form part of the repertoire of literate speakers? The most obvious problem with the transitivity thesis is that the character "system" used in the different countries is not the same, not even in its externals, owing to independent reforms. According to Virginia Chen, of 2,295 characters simplified in China, 309 in Japan, and 502 in Singapore, "only 178 original characters were simplified in all three countries. Of these 178 characters, only 48 were simplified in identical manner" (1977:64). Seventy of Japan's simplified characters have no counterpart in China, and only sixty of them have the same forms as China's. In Singapore, seventy-eight characters were simplified differently from their People's Republic of China equivalents. In Taiwan and South Korea none of these changes -- neither Japan's nor China's-- found their way into the standard inventory.

These variations in the forms of characters used by different East Asian countries are apparent even to Westerners not trained in the languages or writing systems. But there is more to the problem. Long traditions of independent use, particularly in Japan, have led to characters being used in one country that have little or no application to the language of another, or to the same characters used with different meanings. The effect of these absolute discontinuities is amplified by practical differences, resulting from government-backed limitations in some countries on the number of characters in use and the availability of hangul in Korea and kana in Japan, which have erased hundreds of "shared" characters from the inventory of most of their potential users. The results of these differences are striking. Highly educated Chinese on both sides of the Taiwan Strait, unless they have learned the other's system, stumble badly when trying to read each other's writing and often can make no sense of a passage at all. Japanese and character-literate Koreans fare even worse than mainland Chinese with materials printed in Taiwan, have virtually no capability with materials printed in the People's Republic of China, and enjoy less success with connected discourse written in each other's language than a literate English speaker has with French.

Even if the forms of the characters did not vary, individual tokens were shared more widely, and they had the same primary meanings in different languages, Chinese characters could not enable East Asians actually to read each other's languages because the languages themselves are different, in both grammar and morphology. Typologically, Chinese has less in common with Japanese and Korean than it has with English. And although Korean and Japanese may have some kind of genetic affiliation, they are communicably as different now, for example, as English is from German. Reading connected discourse in any of these languages is a function of linking the meanings of words (a large percentage of which are indigenous) according to unique grammars, and there is no way Chinese characters or any system of writing can mask these differences.

Assuming a character-literate East Asian in one country had made the effort to learn the different character forms used in another, it is true that he or she would be able to understand segments of discourse written in the other language. But this phenomenon -- whatever its actual utility -- has less to do with the writing system itself than with the fact that the languages share a lot of common vocabulary. The proof lies in the extremely poor cross-language transitivity achieved by the characters when they are used to represent indigenous words in Japanese (kun) as opposed to borrowed Sinitic terms (on). In other words, Chinese characters give literate East Asians approximately the same facility with each other's languages as Westerners enjoy with cognate vocabulary written alphabetically in their languages, namely, a glimpse into the meaning of a text, which, depending on the reader's background, familiarity with the subject, and ability to reconstruct different character forms, mayor may not be enough for some rudimentary understanding.

Ironically, Chinese characters, through their artificial support of moribund Sinitic morphology, their incompatibility with nontraditional word forms, and their reinforcing the notion that writing must be based on syllable-sized units, may be inhibiting cross-language transitivity by restricting the importation of international vocabulary that would otherwise be expressed in an alphabetic system shared by all. Despite complaints from cultural "purists," new terms based largely on English sounds are being borrowed individually into Japanese, Korean, and even Chinese on a scale that decades ago few could have imagined. These words now number in the tens of thousands, but because of the way the writing systems are constituted, they remain entirely opaque in one East Asian language to literate users of another. Rather than promoting cross-cultural communication, the character-based writing systems increasingly are standing in its way, making the languages themselves less relevant to a significant number of their own users.

Unification of Chinese "Dialects"

If transitivity of Chinese characters across languages turns out to be something less than what the system's advocates claim, what about the Chinese "dialects"? Surely one cannot deny the unifying effect Chinese characters have on disparate speech forms within China? Well, as with many other features attributed to Chinese characters, this claim will not hold up to a rigorous analysis either. Unless one trivializes the claim by reducing it to "psychological unity" or, as I shall discuss below, "unity by default," Chinese characters are not much better at bridging linguistic diversity inside the world's most populous country than they are at unifying languages outside China, and for the same reason: what many call "dialects" of Chinese are not dialects at all, but different languages with less in common than the Romance languages of Europe.

Before getting deeper into this discussion, however, I need to emphasize that for some eighty million or more people living in China the "trans-dialectal" feature claimed for Chinese writing cannot apply even in theory, because they speak non-Chinese languages written in alphabetic or indigenous systems.8 Although they are relatively few in number, non-Han peoples dominate half of China's geography and because of their history and culture are far more likely to dissociate themselves from Beijing's laws and standards than Han non-Mandarin speakers living in the south. The irrelevance of Chinese writing to those very people who from the central government's point of view are most in need of it makes the argument that "Chinese characters unify the country" seem rather silly.

If we ignore this inconvenient phenomenon and focus on the speech of China's Han population, we find a collection of at least seven or eight mutually unintelligible varieties that in any other context would be called "languages," but which are "dialects" in China, in part for political reasons and in part because of a problem with the translation of the Chinese term fāngyán. The political motivation for claiming that these distinct varieties constitute a single language is fairly obvious: it is easier to govern a country in which the majority believe they are speaking one "language" (whatever the linguistic reality) composed of several "dialects" instead of several related languages. The terminological problem, however, is genuine. For millennia, Chinese used the word fāngyán ("local speech") to refer both to nonstandard forms of Chinese and to non-Chinese languages spoken within or around China. No distinction was made between a language and a dialect; there was standard Chinese spoken in the political capital and fāngyán spoken elsewhere. Later, under the influence of Western linguistics, Chinese began using the word yǔyán to translate "language" and fāngyán as a standard translation for what is known in the West as "dialect. " But since nonstandard forms of Chinese were already called fāngyán, these mutually unintelligible non-Mandarin varieties became "dialects" of a Chinese "language."

Recognizing the problem, DeFrancis (1984a:53-67) and Mair (1991) proposed translating fāngyán respectively as "regionalect" or "topolect." This solves the technical question, but it leaves nonspecialists with the impression that Chinese is a "special case," when there is nothing special about it. Here is the reality. On the basis of linguistic criteria such as the development of Ancient Chinese voiced initial consonants, palatalization of velars, tonal registers, and certain morphological conventions, supported by the degree of intelligibility and native speakers' own intuitions, Chinese and Western linguists distinguish seven or eight major varieties of Chinese.9 There is "North," or Mandarin, spoken in the northern, central, and southwestern parts of China with some 679 million native speakers;10 Wu spoken by 81 million people on the east coast focusing on Shanghai; Northern and Southern Min spoken by 39 million people in Taiwan, Fujian province, and throughout Southeast Asia; Yue or Cantonese, used by 48 million speakers in the south; and three transitional varieties including Gan (23 million), Xiang (46 million), and Hakka (35 million), spoken respectively in Jiangxi, Hunan, and widely scattered pockets throughout the south. How are these varieties to be classified?

To answer this question at least four factors must be taken into account: the degree of mutual intelligibility, the underlying linguistic causes for the intelligibility or lack of it, how the Chinese situation fits into taxonomies used elsewhere in the world, and how Chinese speakers themselves feel about the problem. The first factor -- degree of intelligibility between the major varieties of Chinese -- can be dealt with easily: there isn't any. One of my strongest early impressions as a student of Chinese in Taiwan was that "Chinese" did not always work. No matter how hard I studied the "national language,"11 there were large groups of people who could not understand me and others who could exclude me from a conversation by switching to some other variety that did not seem like Chinese at all. The situation did not change as my Mandarin improved, until I was finally led some twenty years later by curiosity and frustration deliberately to study Southern Min, an experience that reminded me uncannily of my high school days as an English-speaking student of Latin.

Before this, however, I had wised up to the reality of "Chinese," befriended a series of Wu speakers, and begun to have some fun of my own learning that variety and using it to annoy Mandarin and Min speakers who had no idea what we were saying. And although these experiences prepared me intellectually for my first known encounter with Cantonese (Yue), it was still upsetting to discover that nothing I had learned of the other varieties of Chinese would serve me here. The fāngyán was incomprehensible, as it is to all Mandarin, Min, Wu, and other native Chinese speakers born outside a Cantonese-speaking area, as evidenced, for example, by the Mandarin-speaking Chinese who uses English to order from a Cantonese-speaking Chinese waiter in the United States.

Dialects or languages?

Let's look at another aspect of intelligibility. Early in my studies I discovered that the Taiwanese who could understand the Beijing Mandarin I was learning in school and who professed to speak the "standard language" spoke it in a funny way. Though we understood each other, my interlocutors failed to make certain phonemic distinctions that I had been taught to expect and occasionally used grammar that did not accord with what was in my textbook, although it was easy to figure out. When I tried these street forms in the classroom, I was "corrected" and informed they were not standard Chinese. A more advanced student with a bigger heart told me (to the enormous discomfort of our Beijing-born teacher) that these forms were not wrong but the difference between the Southern Mandarin spoken in Taiwan and the northern variety that passes for the national standard. But at least I was being understood! My first exposure to Southwestern (Sichuan) Mandarin was trying but also manageable. Although colleagues report they have encountered backwoods Mandarin varieties that are unintelligible to standard Mandarin speakers, these cases are exceptional. In the aggregate, Mandarin-speaking China looks very much like the mosaic that characterizes the English-speaking world with its distinct though usually intelligible dialects. Excepting one remarkable incident involving the numbers four and ten (they are segmentally homophonous in Southern Mandarin) that I would rather forget, I have never suffered any consequences that can be attributed to Mandarin speech differences, although there have been lots of laughs. This situation contrasts with the inability of speakers to communicate anything between the major varieties.

The same situation is characteristic of other, non-Mandarin forms of Chinese. On my bookshelf are textbooks of "Amoy Hokkien" (Xiamen Min) spoken in southern Fujian province and parts of Southeast Asia. It seems to have much in common with Taiwanese Min, and I understand parts of it despite my poor background in the latter. Next to that are two series of textbooks compiled by the Defense Language Institute titled Chinese Cantonese and Chinese Cantonese (Toishan). The two varieties are sufficiently distinct to warrant separate treatment, but not so far apart that one cannot be understood by a native speaker of the other. I discovered with some embarrassment that the same applies to Wu. After studying for three years what I thought to be Shanghainese with a tutor from Ningbo, I tried it out one day on a woman from Shanghai. Peals of laughter ensued, after which she informed me, tears still in her eyes, that I was speaking "like a hayseed from Ningbo." But, again, I was being understood, in contrast to a Mandarin-speaking Chinese along for the show who had no idea why the Wu speaker was laughing. So what do we call these differences? Dialectal? If so, what does that make the larger groups that cannot be mutually understood and within which these dialects are subsumed?

The linguistic factors that account for unintelligibility between the major varieties of Chinese are sometimes dismissed by proponents of the one-language view as "mere" differences in sound. In fact, the differences encompass much more than phonology, but let's explore this aspect of the claim anyway using as an example the Shanghainese dialect of Wu, which impressionistically and in terms of linguistic features differs less from Mandarin than either Min or Yue does. Consonant phonemes for Mandarin (Kratochvil1968:25-28) and Wu (Jin 1985:4) are shown in Table 8.

Shanghainese and Mandarin Consonants
	Bilabial	Labio-dental	Dental/alveolar	Alveo-palatal	Palatal	Velar	Glottal
Note: Unique Wu phonemes are in brackets [ ]; phonemes unique to Mandarin are in parentheses ( ).
Stop
voiceless unaspirated	p		t			k	[ˀ]
voiceless aspirated	p'		t'			k'
voiced	[b]		[d]			[g]
Affricate
voiceless unaspirated			ts	tš	(tɕ)
voiceless aspirated			ts'	tš'	(tɕ')
voiced				[dž]
Fricative
voiceless		f	s	š	(ɕ)	h
voiced		[v]	[z]	[ž]
Nasal	m		n	[ny]		[ng]
Liquid			l
Semivowel	w				y

Shanghainese stops (t, t', d) are dental and Mandarin stops (t, t') are alveolar; conversely, Shanghainese affricates and fricatives (ts, ts', s, z) are analyzed as alveolar by Jin, while their Mandarin counterparts (ts, ts', s) are dental. Jin's alveopalatal consonants are treated as palatals by Ramsey (1987:92), but none of this is particularly significant. The important distinction is not where these sounds are articulated, but rather that there are three sets of affricates and fricatives in Mandarin and only two sets in Shanghainese. More important, Shanghainese has eight voiced consonants that are entirely absent in Mandarin (ng is used only as a final in Mandarin) and uses a glottal stop for Ancient Chinese -p, -t, -k endings, which were lost in Mandarin.

Vowel differences are also considerable, as depicted in Table 9 (which includes individual vowel phonemes and those that appear in diphthongs, triphthongs, and before consonant finals). The Shanghainese retroflex (apical) vowel ï is treated by Jin as an upper high back unrounded vowel, different from the apical vowel ɩ, which is pronounced with the tip of the tongue instead of the blade. Perceptually the two sound very similar, although Norman locates it farther back (1988:201). The two Mandarin vowels ɩ and ʅ in fact are one phoneme, with the former value realized after ts, ts', s and the latter after tš, tš', š. Since Shanghainese ï appears only after ts, ts', s, z, the difference is one of distribution. Other distinctions are more important, such as a front high-mid/low-mid contrast in Shanghainese not made in Mandarin and the presence of two rounded mid vowels in Shanghainese that sound strange to a Mandarin speaker. Several of the Mandarin vowels appear only in combinations with other vowels and consonant finals. Shanghainese entirely lacks these descending diphthongs and triphthongs, but the number of its vowel phonemes is much higher.

Shanghainese and Mandarin Vowels
	Front		Central		Back
	Unrounded	Rounded	Unrounded	Rounded	Unrounded	Rounded
Note: Unique Wu phonemes are in brackets [ ]; phonemes unique to Mandarin are in parentheses ( ).
Plain
High	i	ü				u
High-mid	e	[ö]			ɤ	o
Low-mid	[ɛ]		ə	ə̈		ɔ
Low			a		(a)
Retroflex
High	(ɩ)		[ ï ]		( ʅ )
Mid			ɚ

What really distinguishes the two systems are tones. Beijing Mandarin has four, including (on a scale of 1 to 5) high level (55), mid rising (35), a tone that begins mid, drops, then rises (214), and high falling (51). Tone sandhi (changed values that result from contact with other tones) is fairly simple, the most important instance being the change of the dipping tone to a rising tone before another dipping tone. Shanghainese has five tones, but nothing equivalent in contour to the dipping tone in Mandarin. Four of its five tones are spread over two registers, that is, two rising tones (24) and (35), and two essentially level tones (23) and (55). The remaining tone (42) is similar to the falling tone in Mandarin but less abrupt. Although a few of the tonal contours approximate each other, the similarities are mostly fortuitous, and no useful connections can be made between elements of the two systems. In Shanghainese, basic tones are largely determined by the syllable's segmental phonology, according to the presence or absence of voiced initials and the glottal stop ending. In Mandarin, tones are distributed across syllable types much more evenly. Finally, tone sandhi in Shanghainese applies universally, not just to restricted combinations, and operates through complex rules across word boundaries.

I submit that these "mere" differences in phonology are as marked as what obtains between different European languages. They would be even more striking if we had compared Mandarin with a more southern variety like Min or Cantonese, with seven or eight tones, a full range of final consonants, nasalized vowels (in Min), and other features that make them distinct. Of greater concern in the present context, however, are vocabulary differences, the magnitude of which is often obscured by cross-variety linguistic studies of phonological differences, which focus on cognate terms, by casual students of non-Mandarin Chinese who want to know the pronunciation of a word they know in Mandarin and by the fact that these nonstandard varieties, being out of the country's cultural mainstream, tend to adopt Mandarin terms for their higher-level vocabulary. Anyone who knows a non-Mandarin variety or who is familiar with the psychology of its speakers will admit that these "high-level" terms -- for the most part -- are simply grafted onto the body of indigenous words and given new pronunciations. Although an educated, bilingual native speaker of a non-Mandarin variety can usually come up with a plausible pronunciation in the target speech for a Mandarin word, everyone involved knows that the exercise is bogus, either because another word or way of saying the same thing exists already or because the concept itself is not central to the community of speakers.

What is central is the day-to-day vocabulary that, by virtue of its uniqueness, is stigmatized as "colloquial" when in fact it constitutes the language's very core. This fact became apparent to me immediately in my studies of Wu, as my tutor and I searched in vain for characters to transcribe recorded specimens. Often the character was one that had dropped out or had never been part of Mandarin, or that appeared only in literary texts. Other times we ended up inventing characters or borrowing them from Mandarin on the basis of similar sounds or meanings. In retrospect, the activity was not unlike what scholars believe happened when characters were first being formed and applied to the archaic language. When I complained to a colleague who was working with a Hakka dialect, he just laughed and showed me a long list of his own homemade characters. Both Wu and Hakka include so many indigenous words, particularly in their core vocabularies, that the Mandarin-based character writing system was not very applicable no matter how we tried to bend it.

How much do they diverge? According to R. L. Cheng, about 5 percent of the morphemes in Taiwanese "have no appropriate, established Chinese characters to represent them. Since many of these morphemes are high frequency function words, in a written Taiwanese text they account for as much as 15% of the total number of characters" (1978:306). Cheng's statistics, while no doubt valid, understate the problem since many of the "established" characters that can be applied to Taiwanese are peripheral or nonexistent in modern standard Mandarin. Moreover, these morphemes -- shared or not -- often do not combine in the same way to form words. One cannot simply take morphemes or a combination of them from one Sinitic variety (or the characters used to write them, if there are any) and expect to produce anything intelligible to a user of another. All of which is to say, the words themselves are different. The extent of these differences can be appreciated by examining Ruan's (1979) Táiwānhuà rùmén (Introduction to Taiwanese), especially pages 62 to 108, where some two-thirds of the words listed have separate Mandarin glosses. Similarly, Qian Nairong's (1989) Shànghǎi fāngyán lǐyǔ (Colloquial Shanghainese) lists 282 pages of unique Shanghainese terms that are not in Mandarin or have different meanings!

Citing estimates by Chinese linguists, DeFrancis reports "the differences among the regionalects taken as a whole amount, very roughly, to 20 percent in grammar, 40 percent in vocabulary, and 80 percent in pronunciation" (1984a:63). The last two figures are reasonable, but I suspect the grammatical differences are understated because of the difficulty in Chinese of distinguishing lexical features from syntax. Cheng, for example, states that 50 percent of the so-called function "words" in Taiwanese differ from those in Mandarin, a statement that seems to tell us more about the two varieties' respective grammars than about differences in vocabulary alone (1981). How these function words function can be described by rules analogous to what is called "grammar" in Western languages. What seems to play an even greater role in Chinese is a phenomenon loosely defined as "patterning." Ramsey puts his finger on this in the following passage:

Some differences between Cantonese and Mandarin grammar are very subtle. Almost any Mandarin grammatical pattern can be used in Cantonese and be understood, but such locutions are often not idiomatic. Typically, a sensitive and forthright native speaker will say of such Mandarinisms: "You could say it that way -- that sentence pattern exists in Cantonese -- but actually that's not the way we say it, we say it this way: ...." A colloquial Cantonese discourse always has a number of patterns that would sound peculiar in Mandarin. (1987:105)

Assuming rough equivalency in the amount of structure needed in any language to show relationships between concepts, the challenge becomes one of finding this order in languages where it is expressed less overtly. Function words provide part of this structure in Chinese, as does patterning, which can be thought of as a larger body of grammatical rules whose domains are individually narrower. Both devices exhibit marked differences across major varieties of Chinese, especially between standard Mandarin I and the nonstandard southern languages. A third grammatical device -- word order -- also differs from one variety to the next, such as the reverse order of direct and indirect objects in Mandarin and Cantonese, and the placement of certain adverbs in Cantonese. It is hard to imagine a word order difference more striking than use of the ba-construction in Mandarin, which changes a sentence's structure from subject-verb-object to subject-object-verb but is not used in Cantonese.

There are profound linguistic reasons for the mutual unintelligibility that exists between major varieties of Chinese, reasons that go well beyond what is commonly thought of as different ways of pronouncing the same morphemes. How does this situation compare with that of other major speech communities and with the taxonomies used to describe them? Most linguists familiar with the classification problem acknowledge that the major Chinese varieties differ from each other at least on the order of the different languages of the Romance family. History confirms this observation: most of the Chinese varieties separated from their common proto-forms by the eighth or ninth century A.D., which corresponds to or predates the emergence of the Romance languages from Latin. It would seem, therefore, a simple matter to project the taxonomy used to describe concrete linguistic differences in one part of the world to another, that is, to apply the two words "language" and "dialect" consistently and either start calling Spanish and Italian two "dialects" of the Romance "language" or, if that seems inappropriate, stop calling Min and Mandarin two "dialects" of the Chinese "language."

One way out of the dilemma is to call into question the legitimacy of the terms in general by noting, for example, the smooth transition in degrees of intelligibility between Italian and French through border areas (in technical terms, the nonconvergence of linguistic isoglosses). But one need not pretend that one language stops where another starts to recognize -- as do the speakers of languages themselves -- distinct cores of Parisian French versus the Italian spoken in Rome, or Beijing Mandarin versus Shanghai Wu, across which there is no appreciable communication. By shedding the fiction that the major varieties of Chinese are "dialects" instead of languages, other inconsistencies are rectified and the whole taxonomy falls neatly into place. On one end of the scale, what look for all the world like dialectal differences within Mandarin, Wu and, for that matter, each of the major Chinese varieties really do become dialects instead of -- what? On the other end, Chinese (Hànyǔ) can take its proper place as a language group within the Sino-Tibetan family, along with what the government of the People's Republic of China officially recognizes as that family's other groups -- Tibeto-Burman, Miao-Yao, and Zhuang-Dong -- eliminating a badly skewed (and very suspicious) distribution that accords no subdivisions whatsoever to Hànyǔ, which is used by the overwhelming majority of the family's speakers, but defines the other three groups in detail down to the branch and language level (Mair 1991).

Another way to avoid acknowledging that "A" is "A" is to reject linguistics, symmetry, and objective criteria altogether and rely instead on political boundaries or the subjective notions of the speech community (however that may be defined). The first of these latter two "criteria " can be dismissed, since it would require Han Chinese either to call Tibetan and Chinese one and the same "language," because they are genetically related and fall at present within the same geopolitical boundary, or to agree to Tibetan demands for political independence -- a choice no Han Chinese would enjoy making. This is not sophistry; it only looks stupid because the idea of using national boundaries to determine linguistic categories is inherently unsound. The fallback argument would be, "Well, we really mean the Chinese spoken inside China." But this does not work either, since it forces us -- if consistency still matters -- to rename Miao-Yao and Zhuang-Dong "languages" instead of "language groups" because they are also spoken primarily inside China, which is a bit hard to swallow. The whole rationale for calling Chinese a "language" comes down, it would seem, to simple wish-fulfillment. Chinese is a language because certain of its speakers want it to be, and if objective criteria get in the way, who cares?

It is tempting to explore why this last "factor," as it were, is taken seriously by some Western linguists who would oppose such muddleheadedness in their own technical specialties but are willing to allow it here on a grand scale for China. Part of the reason, I believe, is sympathy with the Beijing government's efforts to unify China on its own (or any) terms, abetted by the same sort of cultural relativism that has found its way nowadays even into the hard sciences. Add to this sympathy China's never-ending insistence on being viewed as a "special case" where universal criteria do not apply, along with the pressure it can put on its own scholars to support this perverse view, and one comes up with a fair picture of how the single-language myth is maintained. But do the Chinese really accept the myth themselves? We do know for certain (1) that Chinese are highly aware of the linguistic differences between them; (2) that this is especially true of non-Mandarin Chinese speakers, who are taking new pride in and a fresh look at their own native vernaculars, embellishing them with such refinements as dictionaries, textbooks, formal instruction, writing, media exposure, and legal status, which in the lay view are associated with different languages (Hannas and Edelstein 1994); and (3) that most Chinese are not equipped to deal with this terminological problem anyway, since their word fāngyán, as we have seen, glosses over the language versus dialect distinction.

Returning to the purpose of our inquiry, if the major varieties of Chinese are not "dialects" at all but different languages, then Chinese characters should not be any more able to transcend the differences between them than they can those in the different East Asian languages, which in fact is the case. We have seen that the Chinese languages differ not just in pronunciation but also in vocabulary and grammar, and that these differences are realized through unique morphemes (or unique uses of shared morphemes) for which characters do not exist at all, do not exist in Mandarin, or are used with different meanings and functions. Consequently, character texts in Cantonese and (where available) in Taiwanese are largely unintelligible to Mandarin readers. Many characters are completely unfamiliar; others are recognizable but make no sense in context. This occurs where conventions exist for writing the non-Mandarin variety in characters. Actually, most of these languages have no established writing system and hence lack even the possibility of being understood by readers of other varieties.

The failure of the character writing system to provide Chinese speakers trained in one variety with the means to read other, non-Mandarin varieties exposes the transitivity thesis as a sham. Oddly enough, this view is not disputed. When people claim Chinese characters "transcend" the "dialects," they are usually not even thinking about how literate Mandarin speakers use their knowledge of characters to read non-Mandarin Chinese. What they really mean is that characters allegedly help non-Mandarin speakers read Mandarin. But if the feature does not work in one direction, how can it work in the other? The only explanation for the ability of some non-Mandarin speakers to read Mandarin-based character texts is bilingualism, pure and simple, that is, they have taken the trouble to learn Mandarin (the language, if not its spoken form) and the character writing system that goes along with it. Thus, in a very twisted sense, the characters do "unify" Chinese by denying some 275 million non-Mandarin Chinese speakers literacy in their own native languages and forcing them, by virtue of its being the only sanctioned orthography in China, to learn the language of the politically dominant group.

What applies to the character writing system across languages also applies across time. Character-literate Chinese are no better equipped to read ancient Chinese texts than they are texts written in other East Asian or Chinese languages, for the same reasons: major differences in vocabulary, grammar, and style that make older states of the language mostly incomprehensible to anyone who has not had special training. I recall my first trip through Taiwan's National Palace Museum and the exasperation I felt when, after years of intensive study of the modern written language, I was unable to decipher inscriptions in the classical style written no more than a few hundred years ago. My companion, a well-educated native speaker, could not provide much help. Every year American students with native Chinese skills enroll in a classical Chinese course and end up doing no better (often worse) than classmates without their modern Chinese background. Not only are the underlying languages (or language states) different, the inventories of shared symbols used to write them often have different meanings, erasing what little "transitivity" even this knowledge provides.

Synchronically or diachronically, the notion that Chinese characters offer literate Chinese a bridge across linguistic boundaries is pure fiction. One could even argue that its effect is the opposite. By allowing non-native speakers to read Mandarin-based texts with nonstandard pronunciations, the characters are reinforcing the differences that they are supposed to eliminate.

Chinese Characters and the Lexicon

The goal of this chapter has been to assess the appropriateness of Chinese characters to East Asian languages by examining claims to the effect that the characters accommodate idiosyncratic features of these languages better than other types of writing and hence are worth using despite their many shortcomings. Our analysis has shown that these claims either are vacuous (the "transitivity" of characters across space and time) or confuse the cause of a problem with its solution (monosyllabic morphology and too many homonyms). In addition, we have seen that the acclaimed "word-building power" of character-based morphemes, while offering East Asians a means to cope with the expansion of new concepts, has had serious side effects, namely, words that cannot be distinguished phonetically and the use of "words" that are not words at all. This "power" of Chinese characters to create new terms, seen in another light, is simply a system run amok, unchecked by the ordinary requirements of phonetic intelligibility and popular sanction.

The ability of character-morphemes to combine freely as single-syllable units into new terms and of the system to assert itself (until very recently) as the dominant paradigm in word formation has had other consequences germane to the present inquiry. An analysis of these consequences will further support the thesis that the "appropriateness" of Chinese characters to the languages is merely an ex post facto rationalization of effects produced on the languages by the characters. In other words, Chinese characters "fit" East Asian languages by virtue of having molded them over the centuries in all aspects -- phonology, lexicon, and even syntax -- according to the writing system's own peculiarities, in particular, its requirement that morphemes be one syllable long and that all syllables have meaning. Not surprisingly, this one-syllable-one-morpheme alignment is largely what one does find in a written passage of modern standard Mandarin and in the Sinitic lexicons of other East Asian languages. But it is not characteristic of the way these languages were and almost certainly is not how they will be in the future.

Research into early states of Chinese and into certain types of pre-modern colloquial literature shows a language made up not only of polysyllabic words, but also of polysyllabic morphemes. Although many of the latter were borrowed into Chinese from non-East Asian sources, some portion of them either were indigenous or were adopted so early in the language's history as to make the distinction between borrowed and native vocabulary meaningless. Just how much the spoken language was characterized by polysyllabic morphemes we will never know, since expressing the language in writing meant reducing these units to a form compatible with the medium, so that each written syllable-sized unit had a meaning of its own that could potentially stand by itself. When the language failed to correspond to the requirements of the writing system, Chinese simply reanalyzed the term so that it would consist of as many morphemes as it had syllables and characters representing it, and used one of the new single-syllable morphemes for the whole, either as a "word " by itself or in new polysyllabic combinations with other single-syllable morphemes.

Evidence of this process is found not only in the disposition of foreign polysyllabic loanwords, but also in the lexicons of non-Mandarin Chinese languages, which are characterized to a remarkable degree by polysyllabic morphemes, especially in their colloquial vocabulary. Because most of these languages never had much (or anything) to do with Chinese characters, they were never exposed to their "monosyllabification" effect. Since these languages are based almost entirely in speech, even when they are written or glossed with characters for textbooks or linguistic studies, their polysyllabic morphologies are maintained. This morphology is seen, for example, in the cooccurence of two or more characters that are not used individually in other compounds and in the use of dummy characters (often with the "mouth " radical) that do not show up elsewhere and were clearly contrived to represent a single-morpheme polysyllabic word. Unlike in modern Mandarin, where polysyllabic words are often the result of recombining single-syllable morphemes (in some cases just to make the words intelligible in speech), many polysyllabic words in non-Mandarin Chinese were so from the start. Their relative immunity from the monosyllabification process plus the fact that they tend as a whole to reflect earlier states of the language better than Mandarin suggest rather strongly that Mandarin is the anomaly -- not the other way around. Chinese characters over time imposed their own order on the standard language that used the system for its "representation," generating by their own logic the conditions that make written Mandarin, as it is now constituted, amenable to morphosyllabic writing.

The deceptive ease with which one-syllable meaningful elements, each supported by its own unique written symbol,12 could be thrown together without regard to the phonetic result to form new concepts or represent borrowed ones also had an enormous impact on the structure of the Korean and Japanese lexicons, although here the molding mechanism was different. In Chinese, the characters became "appropriate" to the language by fostering a monosyllabic morphology that matched the system's unique requirements. In the other East Asian languages, they accomplished the same thing by enabling Sinitic roots to outcompete indigenous morphemes and morphological processes and to emerge as the predominant word-building units. As the Sinitic morphemes took hold, the character writing on which the morphemes depended became necessary not only for social reasons but absolutely to insure that texts would be intelligible. The languages in effect became Sinicized, having lost a good deal of what was their own, in fact and in principle, through displacement and then through neglect.

If this competition had been fair, one could hardly quibble with the characters' success. But two factors skewed the field so badly that the indigenous morphologies had no chance to develop as viable alternatives. On the one hand, there was the enormous prestige China and the Chinese language had enjoyed since the Tang dynasty in countries on China's periphery, which would have been enough to establish Sinitic loans and the writing system in these languages whatever their actual utility. On the other hand, with a head start of a millennium or more, Chinese characters were already available to serve the needs of these developing languages and hence became a quick fix both as direct loans and as morphemes that could be assembled on the basis of meaning alone, without having to stand the test of phonetic intelligibility. The indigenous morphemes, which were intelligible phonetically, were longer, less malleable, and could not compete in the written medium, which was where most of the innovation was taking place. When efforts began during this century by linguists in Japan and especially Korea to reestablish the indigenous morphologies for the sake of national pride and to make the written languages phonetically viable, their creations were spurned by the public either for being too long or -- a far worse sin -- for looking like fakes.

There is nothing in the indigenous structure of Japanese or Korean that lends itself to representation by Chinese characters. What compatibility does exist between these languages and character-based writing is a function of changes brought about directly or indirectly by the writing itself. There are clear signs, however, that the incestuous process of using and reusing the same phonetically depleted Sinitic morphemes to form new words has broken down. Although Sinitic morphology still plays a role, it must now compete with Western loanwords written in katakana and hangul as direct, phonetic borrowings. Even in Chinese, the incidence of sound-based, polysyllabic borrowing seems to be rising and is forcing itself into the written language through a subset of characters used for their phonetic values alone. As sound-based media develop technologically and their use becomes more widespread, the pressure for these languages to adjust will intensify, rendering Chinese characters and traditional Sinitic morphology anachronistic and eliminating what vestiges of "appropriateness" still remain.

Notes

See, for example, Coulmas 1989:44.
Interestingly, many of these three- and four-syllable words came into service in conscious imitation of European-language morphology. There was little, if anything, in the indigenous Sinitic tradition that encouraged multisyllable words.
Li Xingjie mentions this in his criticism of the fallacy (1987:29).
In Zhōngguó yǔwén, February 1953. Cited by Ohara 1989:159.
See Mair 1992:5-13 for examples.
Excepted are the Ancient Chinese -p, -t, -k endings, analyzed in the Chinese linguistic tradition as an "entering tone" and adapted by the borrowing languages more or less as is (Korean, Vietnamese) or as the initial consonant of a second syllable (Japanese). Vietnamese, also a tonal language, was able to accommodate this Chinese feature.
I.e., the character as a whole. Most characters have components that were based etymologically on sound and that play an important role in helping users identify and process the unit.
The official figure for China's non-Han population was 67 million in 1982, compared with a Han population of 950 million (Ramsey 1987:164-165). Assuming a present population of 1.2 billion, the non-Han figure rises to 79 million and is probably much higher.
Yuan Jiahua (1960), Zhan Bohui (1981), DeFrancis (1984a), Ramsey (1987), and Norman (1988).
Figures are from Ramsey (1987:87) and are based on a Han population of 950 million.
Guóyǔ in Taiwan, and pǔtōnghuà ("common speech") in the People's Republic of China. The two are essentially identical, although in practice Taiwan speakers model their speech on the southern standard. Both terms are translated into English as "Mandarin."
The support need not be direct. The identification of a character with a unique meaning and a Sinitic sound in any of the languages is enough to establish its viability in the others where characters are not used, that is, in Vietnam and North Korea.

contents | romanization-related books

Pinyin