10. Interloping Scripts

 
Home > Greek > Unicode

The cultural importance of Greek has meant that several other scripts and notational devices encroach on its territory, as already seen with respect to Script Mixing and the Astral Planes: scripts either borrow characters from Greek, or have a Greek heritage and continue to use characters from it. Usually this means that the Unicode encoding of those scripts and notations simply uses the Greek codepoints. On occasion this does not happen, either because the characters have undergone some change in their form; or because they are regarded by the user community as integrated into their new script; or because the typographical tradition of the script has evolved differently, and makes the conflation of Greek and the script look awkward. So there are both conflations and disjunctions of Greek from the other scripts.

1. IPA

The International Phonetic Alphabet, first devised by the International Phonetic Association in 1888, has an eclectic repertoire of symbols, mostly derived from Latin and Greek. The Greek symbols used do not have the same appearance they would in a normal Greek font: their shapes have been modified to harmonise in a Latin context (since letters from both scripts appear in the same word). The IPA version of the gamma, upsilon, and phi, for instance, have true serifs, which are alien to the Greek typographical tradition. (The upsilon is so changed in fact as to be unrecognisable: the glyph is more commonly termed a "bucket".) Using the Greek version of the glyphs would be regarded as a typographical error—frequent though such moves have been in the past, for the sake of convenience. So the Principles of the IPA (International Phonetic Association 1949:1):

The non-roman letters of the International Phonetic Alphabet have been designed as far as possible to harmonise well with the roman letters. The Association does not recognise makeshift letters; it recognises only letters which have been carefully cut so as to be in harmony with the other letters. For instance, the Greek letters included in the International Alphabet are cut in roman adaptation.

As a result, several IPA symbols have been disunified from their Greek counterparts: even though they are called the same, they are not exactly shaped the same. Some others are still shared with the Greek block, rather than represented independently in the IPA Extensions block (U+0250 - U+2AF).

The characters shared between Greek and the IPA are:

Phonetic Value IPA Symbol Greek Letter
Open back unrounded vowel U+0251 Small Letter Alpha ɑ U+03B1 Greek Small Letter Alpha α
Bilabial voiced fricative U+03B2 Greek Small Letter Beta β U+03B2 Greek Small Letter Beta β
Velar voiced fricative U+0263 Latin Small Letter Gamma ɣ U+03B3 Greek Small Letter Gamma γ
Mid-High back unrounded vowel U+0264 Latin Small Letter Rams Horns ɤ [U+03B3 Greek Small Letter Gamma] γ
Mid-Low front unrounded vowel U+025B Latin Small Letter Open E ɛ U+03B5 Greek Small Letter Epsilon ε
Dental voiceless fricative U+03B8 Greek Small Letter Theta θ U+03B8 Greek Small Letter Theta θ
Mid-High central rounded vowel U+0275 Latin Small Letter Barred O ɵ [U+03B8 Greek Small Letter Theta] θ
High front unrounded lax vowel (obsolete) U+0269 Latin Small Letter Iota ɩ U+03B9 Greek Small Letter Iota ι
High back rounded lax vowel U+028A Latin Small Letter Upsilon ʊ U+03C5 Greek Small Letter Upsilon υ
Labiodental voiced aproximant U+028B Latin Small Letter V With Hook ʋ [U+03C5 Greek Small Letter Upsilon] υ
Bilabial voiceless fricative U+0278 Latin Small Letter Phi ɸ U+03C6 Greek Small Letter Phi φ
Uvular voiceless fricative U+03C7 Greek Small Letter Chi χ U+03C7 Greek Small Letter Chi χ
High back rounded lax vowel (obsolete) U+0277 Latin Small Letter Closed Omega ɷ [U+03C9 Greek Small Letter Omega] ω

To these one may add the diacritic uses of U+02E0 Modifier Letter Small Gamma, ˠ, used to indicate velarisation.

Phonetic notations outside the IPA standard or peripheral to it have at times employed other Greek letters: Pullum & Laduslaw (1996) record usage of alpha, delta, eta, lambda (in mainstream Americanist use as a voiced alveolar lateral affricate, IPA [dɮ]), pi, rho, sigma (more popular in phonology as a symbol for syllable), and omega.

1.1. Particularities of IPA symbols

1.2. African Orthography

1.3. IPA Remnants

We are left with three IPA conflations with Greek: beta, theta, and chi. None of these characters made it to the African orthographies, so they never developed uppercase versions which would justify disunification as for epsilon and iota: instead of the phi, the African orthography uses for its bilabial voiceless fricative U+0191 Latin Capital Letter F With Hook, Ƒ, U+0192 Latin Small Letter F With Hook, ƒ. Instead of beta for the voiced fricative, it uses the approximant symbol U+028B Latin Small Letter V With Hook, ʋ, and its capital counterpart U+01B2 Latin Small Letter V With Hook, Ʋ. And no dental fricatives in Africa, apparently: it is a fairly rare sound in the world's languages.

Now, the chi is normally drawn the same way by everyone, although recent font designers have been cosying up to the Latin x (Haralambous: §1.7.4); so the conflation is not worrisome. But the conflation of the theta and beta, as with the treatment of the mathematic glyphs, requires that the font use only the reference glyphs for the Greek characters, and not the cursive designs. (As noted, this is why a general use font like Gentium or Palatino Linotype, covering mathematics or the IPA as well as Greek, cannot afford to use the cursive glyphs relegated to U+03D0 Greek Beta Symbol, U+03D1 Greek Theta Symbol.) Moreover, the 1949 Principles explicitly discuss theta and beta as glyphs disunified from Greek:

Thus, since the ordinary shape of the Greek letter β does not harmonise with roman type, in the International Phonetc Alphabet it is given the form β. And of the two forms of Greek theta, θ and ϑ, it has been necessary to choose the first (in vertical form), since the second cannot be made to harmonise with roman letters.

Can you tell the difference between the betas? Well, the former is italic, as is the default styling for Greek in a Latin context in mathematics and elsewhere, and in English printing of classics—but certainly not in Greece. The latter, as printed in 1949, has the expected serif at the bottom; so the SIL IPA Doulos glyph for beta is .

Of the two distinctions, the italics/plain difference is immaterial, and the IPA themselves recognise that their plain theta is the same character as the italic Greek closed theta. The serif is a greater difference, and is not used in Greek typography; this was reason enough to disunify Latin phi. The serifed beta has not been showing up in Unicode—the only font I know that has it is Caslon. This hasn't been noticed by Unicode, and even if it has, it has been deemed a minor glyph difference. The current version of Doulos IPA, the successor to SIL IPA Doulos, omits beta altogether; but one would expect this to be handled as a Serbian italics style distinction—so a font like Doulos IPA which will not be used for Greek will be free to have the serif on its beta. (This is an inversion of the IPA/mathematics requirement blocking the cursive forms of beta and theta.) That phi passed the disjunction criterion and beta did not is probably an accident; but it's done now.

Everson had proposed in May 1998 that IPA beta, theta and chi be disunified from Greek. Everson did not appeal to any typographical differentiation, as might have been argued for with beta. Instead Everson claims that

The International Phonetic Alphabet is the Latin alphabet with extensions. When the UCS [Universal Character Set] was designed, all of the IPA characters derived from Greek letters should have been included, not just some of them. It makes no sense whatsoever for the UCS to contain, as it does for the IPA, α, γ, ε, υ, and φ, but not β, θ, and χ. The cost of this disunification is that it might affect some UCS coded IPA data. But UCS implementations are not that widespread, and the unification was a mistake in the first place. The benefit of the disunification is in the management and presentation of data. Unless LATIN LETTER THETA and GREEK LETTER THETA are distinguished, it would be impossible to correctly sort, for instance, a word list which contained Greek and Avestan words, or any other combination of IPA text and Greek text. Users expect (for instance) Avestan words to sort together; the unification would force Avestan words beginning with the Greek-derived IPA letters to sort within the Greek list, not in the Latin-script where they are required to sort.

The proposal failed, and that's fine with me. To correct some representations made, though:

2. Uralic Phonetic Alphabet

The Uralic Phonetic Alphabet is used in the linguistic analysis of Uralic languages alone; it is begotten of the unfortunate tendency I've bemoaned elsewhere, for each linguistic subdiscipline in the 19th century to come up with its own transcription scheme. The UPA, with its rotations, smallcaps, and its frankly gargantuan repertoire, is more oddball than most. As you can see, I don't like the UPA; then agan, I'm not a Uralicist. The alphabet is described in Everson's proposal of March 2002 (defended and reorganised in May 2002); it has been incorporated in Unicode 4.0 as the block Phonetic Extensions U+1D00 - U+1D7F.

On top of the usual borrowing of Greek characters (e.g. UPA [ρ] = IPA [ʀ], UPA [ψ] = IPA [β̝]), the UPA uses smallcap, superscript, and subscript Greek letters to make phonological distinctions. The March proposal would have taken up much of the remaining free space in the Greek and Coptic block—but these thankfully were banished by May to the UPA block. (The Spacing Modifier block did not escape the UPA expansion, and U+02EF - U+02FF from the UPA have filled it out completely.) The May 2002 codepoints are what have been incorporated into Unicode 4.0.

March Codepoint May Codepoint Name   IPA near-equivalent Description
U+0370 U+1D26 Greek Letter Small Capital Gamma ɣ̥̝ Semi-voiced velar fortis fricative
U+0371 U+1D27 Greek Letter Small Capital Lambda ɬ̬ Semi-voiced lateral fricative
U+0372 U+1D28 Greek Letter Small Capital Pi ɬ̝ Lenis voiceless lateral fricative
U+0373 U+1D29 Greek Letter Small Capital Rho ʀ̥ Voiceless uvular trill
U+0376 U+1D2A Greek Letter Small Capital Psi ɸ̝ Voiceless bilabial approximant
U+0377 U+1D66 Greek Subscript Small Letter Beta    
U+0378 U+1D67 Greek Subscript Small Letter Greek Gamma    
U+0379 U+1D68 Greek Subscript Small Letter Delta    
U+037B U+1D69 Greek Subscript Small Letter Greek Phi    
U+037C U+1D6A Greek Subscript Small Letter Chi    
U+037D U+1D5D Modifier Letter Small Beta    
U+037F U+1D5E Modifier Letter Small Greek Gamma    
U+0380 U+1D5F Modifier Letter Small Delta    
U+0381 U+1D60 Modifier Letter Small Greek Phi    
U+0382 U+1D61 Modifier Letter Small Chi    

The May proposal vigorously defends the inclusion of the UPA in Plane 0, as opposed to the Astral Planes that the Consortium was considering sending them to. I'm not convinced by the rationales given, especially where they invoke technical problems (Astral won't stay "Astral" much longer); and as for the inconvenience to Uralicists of having their characters in Plane 1—what, Greek musicologists and epigraphers are chopped liver? At any rate, the inclusion in Plane 0 is done; and Plane 0 is still not exhausted for minority script use, so this is not a real problem. And with the exclusion of the Greek-derived characters from the Greek block, there is no danger of confusing these with Greek characters.

One of the arguments for banishing to Plane 1, which Everson argues against, is the parallel between these styled Latin and Greek characters and the mathematical use of styled characters. There isn't the risk of misuse that there is for the mathematical symbols, because there are only five uppercase and five different lowercase characters involved, and they aren't as immediately usable as bold, italic and monospace. Nonetheless, it's just as well the UPA characters are still nowhere near U+0370 now.

3. Archaic Cyrillic

The archaic version of the Cyrillic script was simply 9th century Greek plus extra characters. Since it was Greek, it was easy to import Greek words with their Greek spelling, even though there were Greek characters that were redundant or useless to the writing of Slavonic. Most of these characters dropped out of use after a couple of centuries, though a few (notably Fita) hung on until Modern times—and Soviet orthographic reform. The Milesian system was also imported into Slavonic, bringing with it both the numeric Greek charcters and added rationale for hanging on to character distinctions irrelevant to Slavonic. This is why a common name for U+0438 Cyrillic Small Letter I, и, and U+0458 Cyrillic Small Letter Byelorussian-Ukrainian I, ј (originally izhei and izhe) is decimal and octal i: the distinction between the two makes sense only as the numerical distinction between 8 and 10 (η and ι) as numerals, since eta and iota had long been pronounced identically in Greek.

As the Unicode names of the characters show, the distinction between the characters is now regional: Byelorussian and Ukrainian use the decimal character (< iota), the other Cyrillic languages use the octal (< eta).

Where two characters had the same phonological value, sometimes the differentiation reflected Greek etymology; sometimes it was "for decorative reasons only"; Berdnikov discusses archaic Cyrillic usage further.

The archaic Greek characters no longer in use in Cyrillic are still treated as Cyrillic letters, rather than unified with their Greek progenitors; at least some of them were used in normal Slavonic words, and typographically they developed differently from Standard Greek (e.g. ksi). In most fonts, they look rather more old fashioned than normal Cyrillic; that is because most of these characters were restricted to Old Church Slavonic, which is traditionally printed in a typeface close to what was in the manuscripts, rather than modernising it (i.e. using the "civil" form of the alphabet, as revised under Peter the Great).

The archaic characters are:

U+0460 Cyrillic Capital Letter Omega Ѡ U+0461 Cyrillic Small Letter Omega ѡ
U+046E Cyrillic Capital Letter Ksi Ѯ U+046F Cyrillic Small Letter Ksi ѯ
U+0470 Cyrillic Capital Letter Psi Ѱ U+0471 Cyrillic Small Letter Psi ѱ
U+0472 Cyrillic Capital Letter Fita Ѳ U+0473 Cyrillic Small Letter Fita ѳ
U+0474 Cyrillic Capital Letter Izhitsa (< upsilon; also U+0423 Cyrillic Capital Letter U, У) Ѵ U+0475 Cyrillic Small Letter Izhitsa (< upsilon; also U+0443 Cyrillic Small Letter U, у) ѵ
U+0478 Cyrillic Capital Letter Uk Ѹ U+0479 Cyrillic Small Letter Uk ѹ
U+047A Cyrillic Capital Letter Round Omega Ѻ U+047B Cyrillic Small Letter Round Omega ѻ
U+0480 Cyrillic Capital Letter Koppa Ҁ U+0481 Cyrillic Small Letter Koppa ҁ

You may be wondering what became of stigma and sampi. Russian added plenty of letters to Greek, so there was no real need to look to Phoenecian remnants to get 27 numeric characters: in fact even koppa was quickly supplanted in its numerical use by cherv (now U+0447 Cyrillic Small Letter Che, ч). Stigma was filled in by zelo (now U+0455 Cyrillic Small Letter Dze, ѕ, confined to Macedonian Slavonic). Sampi was covered by the first of the trailing Cyrillic additions, cy (U+0446 Cyrillic Small Letter Tse, ц).

4. Coptic

Just because Old Church Slavonic is almost always printed in an old fashioned font and Russian isn't, that does not mean anyone wants the scripts of Old Church Slavonic and Russian to be disunified. Despite the reforms of Peter the Great and Lenin, with characters both added and subtracted, this is identifiably the same script. And to disunify the scripts would not only complicate information processing, but would also send an unpalatable political message—that Russian, or Bulgarian, or Serbian are not part of the cultural patrimony of St Cyril. (That's a meta-message the Abkhaz probably wouldn't object to as strongly.)

It is that line of thinking that originally led the Coptic and Greek scripts to be unified: Coptic is uncial Greek plus extra characters, so there is no need to duplicate the Greek characters already there—but just to make sure that a Coptic font displays the Greek charaacters in their uncial form.

It should come as no surprise to readers of this site that Michael Everson has been the force behind urging a disunification of Coptic and Greek (proposal #1; proposal #2; proposal #3). The disunification looks like happening, and what has been crucial to making it happen is that it's also what the Copticists want—there's no real point in arguing with a unanimous resolution of the International Association for Coptic Studies. Unlike Slavonicists, Copticists (and Copts) have no great desire to regard Coptic as part of the Greek patrimony; and as Everson points out, typographical differentiation between Coptic and Greek has been routine for decades.

Moreover, integrating the Coptic codepoints into Greek has meant that if you have no intent for your Greek characters to be uncial in your font, you have had to de-uncialise the Coptic-specific characters as well; the results are silly, and Everson does demonstrate rather well that Greek, Coptic, Gothic and Cyrillic have drifted further apart typographically than have Gaelic and Fraktur. (Though if anything, he understates how deranged Irish looks in Fraktur and German in Gaelic.)

Everson argues that it is perverse to unify Coptic with Greek but not Gothic (and, one might point out, Old Italic); I agree, and my conclusion would be to unify the lot (but leave Cyrillic alone). But as I've discussed, it's not like those scripts are actually being used by Gothicists and Italicists anyway; the patrimony they're hooking up their objects of study to are Germanic and Italian—which are Latin script territory. So a unification of the scripts with Greek wouldn't be what tugs at the specialists' heartstrings. Furthermore, the scripts themselves are still too close to Greek to convey the appropriate meta-message; if you want to slot Gothic into the Germanic tradition, you do it by using thorns, not thetas or even thiuths. And that does help explain why there is no scholarly tradition of publishing Gothic in its Greek-like script: not because it was technically difficult—no more so than publishing Coptic in uncial Greek, or Old Church Slavonic in Olde Style Cyrillic; but because Gothic had to have the same script as German, since German scholars claimed it as their own.

Of course, that's also a matter of practicality as much as anything else; Germans already know how to read German script. It also explains why Old Church Slavonic is not printed in Glagolitic, though much of it was originally in that script: Russians can read Cyrillic, even if Olde Style, but not Glagolitic. But Coptic kept itself typographically distinct, and here the difference was not the Copticists, but the Copts: the script remained in use by a non-academic community that regarded itself as distinct from the culturally interloping script communities, and preserved its script accordingly. It helped that the Greeks were no longer using uncial themselves.

The Slavonicists, incidentally, would have their Old Church Slavonic in Cyrillic even if they had to beat Cyrillic into submission to do it. Berdnikov (p. 9) reports that Cyrillic transliterations actually had to invent an extra letter, gherv—to represent a Glagolitic letter whose value [ɟ] had died out in Old Church Slavonic by the introduction of Cyrillic. (The Glagolitic original character, djervi, is currently proposed as U+2C3C Glagolitic Small Letter Dervi, ⰼ.) The Cyrillic gherv has not been included in Unicode, and the Old Church Slavonic Online course makes do with the later Serbian letter with the same phonetic value, U+0452 Cyrillic Small Letter Dje, ђ. One might argue that this is a valid conflation; we'd need to hear from Serbian Slavonicists to know for sure. And Macedonian Slavonicists, for that matter, given that Macedonian Slavonic uses U+0453 Cyrillic Small Letter Gje, ѓ, with the same value.

It's petty to reduce script unification to politics; but that does determine what the Copts and Copticists (and Gothicists and Etruscanists and Aramaicists) want, and what they would rather see conflated and disjoined—not the a priori dispassionate judgement of the Unicoder splitters and lumpers. (Eh, sorta dispassionate, in any case.) And Unicode exists primarily to serve the user communities in what they need—or what they feel they need. Only secondarily does it exist to gratify Unicoders' ideological urges. :-)

5. Mathematics

I've already addressed the havoc mathematics has wrought on the Greek script in Unicode in discussing mathematical symbol letters, as well as the mathematical alphanumeric symbols. The mathematical typesetting of Greek letters in an overall Latin context has developed its own tradition, which Haralambous (§2) spends a little time discussing; it includes the obligatory italicising of Greek, and the preference for closed phi and theta, and distinctive glyphs for alpha and gamma.

Mathematics in Greece, of course, involves Greek mathematical letters in a Greek context, which should lead to ambiguity. This is resolved for Latin mathematical letters in a Latin context by the use of italics—or of plenty of indentation. I don't have the impression that Greek mathematicians are that fussed about the ambiguity; italics have not been in widespread use in traditional Greek typography (which used for emphasis the device of Spärdruck, extended spacing, instead: Haralambous: §1.6.3), and what little Greek mathematics I have access to seems to have no compunction about having everything in plain format.

Nick Nicholas, opoudjis [AT] optusnet . com . au
Created: 2003-09-07; Last revision: 2003-10-10
URL: http://www.opoudjis.net/unicode/unicode_interloping.html