Letters

		Letters
		Home > Greek > Unicode > Stories
Language: ENG ELL EPO JBO TLH LAT		Home > Greek > Unicode > Stories

The normal 24 letters of the Greek script are self-explanatory. The one letter that presents complications is lowercase sigma, which has a medial and a final variant (and the lunate sigma, which is used by papyrologists to obviate that distinction). Unicode has also allocated codepoints for glyph variants of Greek letters, which are deemed to be distinct characters in mathematical use; and there are some puzzling titlecase characters with grave that I try to make sense of here—and fail.

1. Final Sigma

U+03C2 Greek Small Letter Final Sigma [ς]

The final sigma is a positional variant of sigma (U+03C3 Greek Small Letter Sigma, σ), such as also occurs in Hebrew and Arabic. It might legitimately be questioned whether Unicode needed a separate codepoint for the two lowercase sigmas; and indeed, Beta Code has done without the differentiation. The use of distinct codepoints in the legacy scheme of Latin-7 has decided the matter, however.

I have written an extensive document giving all possible cases where final and medial sigmas may not show up in their expected places, which can be used to argue for their distinct status (and indeed has been). I should emphasise, though, that the exceptions are all typographically marginal; the rules for the occurrence of medial and final sigma are followed in normal Greek typography, if one allows for the ambiguity of period as punctuation and abbreviation marker (which, however, is itself reason enough to retain the distinction.) The one exception to a straightforward lookahead algorithm to determine which sigma to use is that sigma used in isolation must be medial, not final; so for completeness any such algorithm (such as a case converter) would also need to be look-behind.

The final sigma is a relatively late innovation. The modern form of the lowercase medial sigma dates from the 8th century (although it is anticipated in cursive form as early as the 1st) (Thompson 1912:189); but initially the medial form was used in all positions of the word. In Thompson's (1912) collection, manuscripts from the 11th and 12th centuries use the lunate intermittently as a final form (p. 239, 244, 248). The lunate was still familiar to the scribes: it remained in use even after the invention of lowercase as the capital form. So it was available for them to use as a lowercase final variant.

Capitals were merely the older, pre-miniscule forms of the letters, which is part of the reason why they look more like the original inscriptional forms. The other reason is that, since they already looked like the classical inscriptional forms, Enlightenment classicists made them look even more like them: the modern capital sigma is an epigraphical revival, no earlier than the 18th century.

The lunate appears as a final form more frequently in the 13th through 15th centuries (p. 252, 263, 267, 268). In some hands, it became the exclusive final form—although other scribes kept using the medial in all contexts even in the 15th century (p. 266: 1437). The modern final form is anticipated in a "half-cursive" manuscript of 1280 (p. 72), and is very clear in a 1416 manuscript (p. 255): it is merely a lunate sigma with a tail.

2. Lunate Sigma

U+03F2 Greek Lunate Sigma Symbol [ϲ]; U+03F9 Greek Capital Lunate Sigma Symbol [Ϲ]

2.1. History

While in the 4th century BC literary papyri still used the ancient angular sigma (Thompson 1912:107), the cursive lunate form had taken over competely by the next century, where it stayed as the unique form of the sigma until the invention of lowercase in the 8th century—and as the capital sigma, for a millenium longer.

The form is called lunate, of course, because it looks like a crescent. Lest my defence of monotonic make me look completely unlettered, let me take this opportunity to reject Haralambous' gloss (§4.2.4) of lunate into Modern Greek as σεληνιακό. That means "lunar", and you can be sure that the lunate wasn't the form of sigma used in the Apollo program. The classical term is μηνοειδής; but since that will make Modern Greek speakers scratch their heads and ask "looks like a month?" (or worse, "looks like Menna, patron saint of Crete?"), one can certainly used σεληνοειδής. This is the post-classical word with which μηνοειδής is glossed in Byzantine dictionaries (Hesychius, Lexica Segueriana, Photius, Suda)—for the sake of the ancestors of Modern Greek speakers who likewise scratched their heads at μηνοειδής.

The word selenoid shows up in English, but only as a mispelling of solenoid (Google: 2350 vs. 681,000), and as the lunar equivalent of the geoid (Google: 22).

While the lunate was banished as a capital letter in the 18th century, it remains familiar to Modern Greeks through ecclesiastical use: it figures in church icons, and in decorative fonts intended to evoke Byzantium (just as epigraphical alpha and sigma——are kept around in decorative fonts to evoke antiquity). It is inevitable, for instance, that the headers of the webpages of the Church of Greece website all have lunate sigmas; what is interesting is that the homepage itself has the Latin word Ecclesia instead. (The point being, of course, that Ecclesia is a Greek word.)

Unlike the OU ligature, which taken on a counterculture air, the lunate sigma has not spilled onto the graffiti'd walls and the car mechanics' shopfronts of Athens; its associations have remained squarely ecclesiastical and mediaeval. (This makes it vaguely similar to Gothic lettering in English, but without the Heavy Metal crossover.)

This is where I take issue with Jannis Androutsopoulos' analysis of the sociolinguistics of glyph choice. Pettily, I admit. He compares his childhood friend's script mixing in using Latin s in her Greek to the nationalist magazine Nemesis using lunate sigmas in its masthead. He associates his friend's s with the fact that

she was brought up in America, she spoke English well, she liked travelling etc., in a few words she was 'extroverted' in her entire attitude, and the Latin s symbolised that extroversion, in my eyes at least. I give a similar intrerpretation to the 'byzantinisation' of σ into ϲ, as I see it e.g. in the title ΝΕΜΕϹΙϹ or in the romanised Greek of email. Only ϲ does not represent extroversion to me, but ancestor worship (παρελθοντολαγνεία).

Them's fighting words, and I'll bite: s in Greek doesn't represent to me extroversion, but embarrassment at being Greek—and pretty unambiguous pretentiousness. More to the point, though, such script mixing is clearly transgressive, and unsanctioned by the norms of Greek society. Using lunate sigma, however, is not, since the church and the Byzantine past are both very much part of how mainstream Greek identity is constructed. Kanelli's journal is making a political move when it uses the lunate (a complicated move, given that she is both a journalist and a communist member of parliament); but the glyph choice is not overtly transgressive in the way Latin s or even the ou-ligature would be.

So the lunate sigma is used within Greek as a decorative glyph variant, comparable perhaps to English long-s, though with more political loading (like everything else linguistic in Greece). This in itself wouldn't guarantee it a codepoint in Unicode.

2.2. The Classicists' Lunate

What does guarantee it is how the lunate is used by classicists. One use is as the default sigma used in general by certain classics publishers—most influential among them Oxford Classical Editions (see e.g. Fig. 14 in Haralambous' 1999 paper). The motivation for this, as with Porson's circumflex, is to revert to an earlier stage of the Greek written tradition—in this case, the uncial tradition, before the 8th century. Even in this role, however, this might merely be regarded as a glyph variant.

The critical use is that by papyrologists: they use the lunate sigma by default, because it is agnostic as to whether it is located at the end of the word or not—and in reconstructing the text of papyri with no spacing and frequent gaps, papyrologists are uncomfortable making that choice explicit by using medial vs. final forms in print.

It is ironic that the final sigma glyph which papyrologists want to avoid committing to itself originates in the lunate sigma. As does the medial sigma, for that matter, at a separation of four centuries.

Papyrologists often publish their texts in two forms: one exactly as the text appears in the papyrus (where you would clearly want the lunate, because the papyrus itself does not indicate the ends of words); and one normalised version, with conventional accentuation and spacing, which is where the editor tries to make sense of the text. In the latter version, the editor is deciding which sigmas are medial and which final, and it is somewhat disingenuous to stick with lunates at that stage—especially since most editions skip right to that stage anyway. But using lunates has become the default among papyrologists, and if it is going to be included as a codepoint for textually uncertain passages, then no further damage is done to the Unicode Standard if it is extended to textually certain (or less uncertain) passages.

2.3. Implementation Specifics

Despite the lunate sigma being called a "symbol" in its name, both lunates have always been classed as alphabetic letters (Ll, Lu), rather than symbols; so they will be treated as parts of words. (The same holds for the symbol variants of Greek letters.) Moreover, the lunates have the compatibility decompositions U+03A3 Greek Capital Letter Sigma and U+03C2 Greek Small Letter Sigma; this means that a text search can treat them correctly as instances of sigma. However, you would need to make sure beforehand that the text search is insensitive to the distinction between medial and final sigma, which will not be the case unless the programmer knows about Greek (final sigma does not have a compatibility decomposition to medial sigma).

These complications mean, of course, that the lunate sigma is only intended to be used in explicitly palaeographical instances; Patrick Rourke discussed this at some length on the Greek Unicode mailing list (2003–05–16).

The capital counterpart to the lunate sigma has long been advocated (Haralambous argues for it in 1999—§1.2.3.6—and 2001—§3.1.) The formal proposal for the capital was made by the TLG in September 2002, and it was adopted in Unicode 4.0. Presumably the capital lunate had been left off because case was alien to the papyri, and if you're working in titlecase, you're already deciding whether the sigma is medial or final by the time you do assign it case. But this is unnecessarily parsimonious: once the lunate sigma was admitted into Unicode, the uppercase version would have to follow, since the lunate is often enough used with modern casing in textually certain instances—normalised papyrological texts, Classical texts, and 'Byzantinised' modern texts.

Where a proper lunate sigma is not available, there is plenty of precedent for people making do with Latin c (as Androutsopoulos does in his article). While the two characters are similar, they are not identical; as Haralambous points out (§1.2.3.6), the lunate sigma lacks the terminating bulb of the Latin character. This is part of a more general problem with mixing Greek and Latin script, which Haralambopoulous touches on further on (§1.7.4): Greek lowercase historically has not had a notion of the serif equivalent to that of Latin, so near equivalents in the two scripts—a α n η ı ι v ν o ο p ρ s ς c ϲ u υ x χ w ω—are not near enough to be intersubstitutable.

2.4. Small Lunate Symbols

U+037B Greek Small Reversed Lunate Sigma Symbol [ͻ], U+037C Greek Small Dotted Lunate Sigma Symbol [ͼ], U+037D Greek Small Reversed DottedLunate Sigma Symbol [ͽ]

Having been bitten by the need to provide casing for lunate sigma, Unicode added lowercase variants of the lunate sigma editorial symbols as well, as of Unicode 5.0 --- to prevent anyone asking for them later. Unlike the lunate sigma, the editorial symbols are symbols, so there is no intrinsic reason for them to be cased; this is an overreaction in my book. But... whatever.

3. Symbol Variants

U+03D0 Greek Beta Symbol [ϐ]; U+03D1 Greek Theta Symbol [ϑ]; U+03D2 Greek Upsilon With Hook Symbol [ϒ]; U+03D3 Greek Upsilon With Acute And Hook Symbol [ϓ]; U+03D4 Greek Upsilon With Diaeresis And Hook Symbol [ϔ]; U+03D5 Greek Phi Symbol [ϕ]; U+03D6 Greek Pi Symbol [ϖ]; U+03F0 Greek Kappa Symbol [ϰ]; U+03F1 Greek Rho Symbol [ϱ]; U+03F4 Greek Capital Theta Symbol [ϴ]; U+03F5 Greek Lunate Epsilon Symbol [ϵ]; U+03F6 Greek Reversed Lunate Epsilon Symbol [϶]

3.1. Maths-Only

As with any script, there is a degree of glyph variation in Greek characters. The difference with the Greek script as opposed to others is that Greek has interloping scripts—scripts which use Greek as a quarry for glyphs for their own scripts. With mathematics in particular, certain glyphs are differentiated semantically from each other, something which does not occur in Greek usage itself. To cope with this differentiation, Unicode assumes that a mathematical font will have codepoints for the two distinct glyphs: the normal glyph in amongst the normal Greek characters, and the differentiated glyph off as a separate codepoint.

The distinct symbols used to have somewhat more descriptive names in Unicode 1.1: U+03D0 curled beta, U+03D1 script theta, U+03D5 script phi, U+03D6 omega pi, U+03F0 script kappa, U+03F1 tailed rho. U+03F4 was originally proposed with the qualifier "with straight bar", and the lunate epsilons as "straight epsilons". U+03B8 is also termed by Haralambous (§1.4) closed theta, and U+03D1 open theta; U+03C6 open phi, and U+03D5 closed phi: the open glyphs are those involving a loop, the closed a circle/oval. The Unicode Standard calls closed and open phi "straight" and "loopy" respectively.

Until Unicode 3.0, the normal character for phi was the closed form, and the mathematical variant was the open. The reference glyphs were swapped in Unicode 3.0, as it was realised that the normal mathematical phi is the closed form, and Greek text uses the open form exclusively, at least in Greece. (The Loeb Classical Library amongst others uses the closed phi, so it is fair to say that Classicists have a greater tolerance for the closed form.) Fonts created before the release of Unicode 3.0 (September 1999) are likely to have the old default form for phi.

Haralambous (§1.2.3, §2; §1.4) has detailed discussion of these symbols; and since he has been a mathematician and I haven't, I defer to him on their mathematical use. I must admit, I do not get the impression from his discussion in §1.2.3 that mathematicians ever truly contrast the two glyphs—e.g. have a single paper contain two variables θ and ϑ that are to be considered distinct. In fact, as we see below, Haralambous concludes that most of these glyphs are not mathematical in provenance at all. But legacy wins the day on this issue: mathematical fonts have always contained both glyphs in a set as separate codepoints, so Unicode has had to do so as well.

Linguistics has no tradition of using the variant letters with different values, although it was a close call: astonishingly, Boas (Pullum & Laduslaw 1996:135) had proposed in 1916 that open theta be used for the voiced interdental fricative (IPA ð), as distinct from closed theta for the unvoiced. Even more astonishingly, a few Americanists have used this, though thankfully most have refrained.

As Haralambous has said repeatedly, the Unicode Standard repeats, and I cannot but dutifully nod along, these variant codepoints are to be used for mathematics only; they are not under any circumstances to be used for Greek text. The symbols with clear normal Greek equivalents (all but reverse epsilon) have compatibility decompositions to those equivalents, and the curious hooked upsilons have canonical decompositions; so a search engine can be made to cope with them. But it shouldn't have to: if you want to write a theta in Greek text, you should use U+03B8 Greek Small Letter Theta, and U+03B8 Greek Small Letter Theta alone. If you want your theta glyph to look like U+03D1 Greek Theta Symbol, then choose a font that makes its U+03B8 look like that: the look of the theta is a font issue, not a codepoint issue.

The corollary to that is, if you are using the Greek script for mathematics (or the IPA), you will require that the normal letter shapes do not look the same as the variant symbol shapes: a font that makes its U+03B8 look like U+03D1 is precisely what you don't want. A font can only satisfy both Greek and Maths/IPA by sticking to the Unicode reference glyphs; but that is a constraint typographers of Greek shouldn't have to put up with: not all fonts should need to cater to both constituencies. In fact, if you go through the glyph behaviour of fonts, you will find that more fonts than you might expect do refuse to cater to both constituencies by choosing different glyphs. To understand why, we need to go through the pedigree of each glyph variant.

3.2. Glyph Pedigrees

Beta: The "symbol" form is actually probably not a mathematical symbol at all, as Haralambous has found; its main provenance is as a medial beta in the French typographical tradition, with the normal beta with a descender the initial beta. This distributional rule is unknown outside France, and as the Unicode Standard notes (following Haralambous' suggestion), shaping beta with a medial vs. initial form is the responsibility of the rendering engine, not Unicode. Outside France, classicists by default use the beta with descender. In Greece, the beta without a descender is the handwritten form, and it has some use in typography as the glyph for beta, particularly in sans-serif typefaces.
Theta: Open theta is the normal handwritten form in Greece, and is an obvious cursive form to come up with; it already appears in the 1st century BC, and remained in use throughout the manuscript and early print periods. However it is less frequent in modern typography as a regular character than beta without descender (of which it is the virtual mirror image). In italics in Greece, it is the usual character, consistent with its handwritten origin. It is less usual in Classics italics.
Upsilon: Again, this is not inherently a mathematical symbol; it is the glyph variant of capital upsilon used in apla, the default typeface (§5.1) in Greece, and also frequently used in Classics. As Øistein Anderson has pointed out to me, mathematicians prefer this glyph because it is distinct from Latin Y; so the mathematics-only treatment of this glyph as a codepoint arguably makes sense. But when the same glyph is used in Greek text, it is not a mathematical symbol; it is an upsilon, pure and simple. So (as happens a little too often in Unicode): one glyph can belong to two codepoints, depending on context, and those codepoints can have rather different textual properties.; The use of this glyph in text as opposed to mathematics explains the regrettable inclusion in Unicode of U+03D3 Greek Upsilon With Acute And Hook Symbol and U+03D4 Greek Upsilon With Diaeresis And Hook Symbol. If these characters bear monotonic diacritics, they are no longer mathematical symbols, but letters: someone somewhere included them in a legacy character set explicitly as alphabetic letters, emulating the apla typeface. Yet the point of these symbol codepoints in Unicode is that they not be letters; and there is only one upsilon in Unicode as far as any search engine should be concerned. It is to prevent any such confusion that Unicode treats the decomposition of hooked to normal upsilon not as a compatibility decomposition, but as a canonical decomposition: as far as Unicode is concerned, these are not sorta the same character, but exactly the same character—and any search engine is expected to comply with this.
Phi: As noted, closed phi is unknown in Greece, but does occur in the typography of the Classics. For most of its Byzantine career the lowercase phi was not open as such, but looked somethat like a reversed G-clef: . By the 13th century the looped phi with just one loop (typically closed off) is usual—though not to the exclusion of either straight phi, which turns up again in the 15th century, or G-clef phi, which remained in use throughout the manuscript period. The looped phi is the glyph used in early printing.
Pi: Curly pi, which dates from the 8th century, formerly enjoyed widespread use in typography, both alongside and instead of the normal and older lowercase pi; and it persisted in Greek cursive handwriting until the mid 20th century. However it has vanished from modern typography (mid 19th century onwards), and from handwriting with the abandonment of cursive.
Kappa: As with upsilon, the kappa symbol form is merely the apla glyph for kappa, and is quite widely used in Greece. My impression is that it is less frequent among Western Classicists. It originates in the cursive form of kappa, and already appears in the 2nd century AD.
Rho: The curly rho is a normal italic variant of rho.
Capital Theta: The distinction between the two capital theta glyphs, whether the central bar extends all the way across the theta and (optionally) has serifs, is quite slight. The normal typographical tradition is represented by the normal glyph; the variant with straight bar is seldom seen in Greek typography even in sans-serif fonts.
Lunate Epsilon: The lunate epsilon is an uncial character, just as the lunate sigma is; the normal lowercase epsilon became usual only in the 12th century. Just like the lunate sigma, the lunate epsilon is a glyph associated in Greece with Byzantium and Orthodox Christianity. Outside that context, the lunate epsilons are best known as set operators in mathematics; they are already included under that guise in Unicode as U+2208 Element of, ∈, and U+220B Contains As Member, ∋. The set operators are in a glyph tradition incommensurate with letter use, and are coded as mathematical symbols (Sm); if the lunate epsilon is to be included as a letter, it would need to have a lowercase letter category (Ll) and a distinct codepoint—which the lunate epsilon does. (The reverse lunate epsilon, on the other hand, has no use in Greek as a character, and its category is accordingly still Sm.)

The Capital Theta Symbol and Lunate Epsilons were proposed in March 2000 to the ISO by the US National Standards Body, as part of the STIX initiative to standardise usage of mathematical fonts. They were added to Unicode 3.1; Ken Whistler has a behind-the-scenes report on the deliberations that went on in accepting them. The lunate epsilons were originally proposed as U+213B Greek Symbol Straight Epsilon and U+213C Greek Symbol Reversed Straight Epsilon. All three symbols are described in the proposal annex as "N: normal or ordinary; e.g., symbol used as a variable." This means that some mathematician out there has used ϵ as a variable—despite the fact that it looks just like a set operator—and ϴ as a variable, but not as a glyph variant of Θ (or U+2205 Empty Set, ∅). ϵ∈ϴ? Well, I guess there's no accounting for taste.

The reason why the epsilons ended up in the Greek block is clear from Whistler's report: ELOT deemed these mathematical characters to be Greek characters—which in the case of lunate epsilon, they more or less are. But just as with the other mathematical symbols: if you're writing a Greek text and want to give it that Olde Bizantine Charme, don't use U+03F5. That's not what it's there for. Agitate for a Unicode uncial font instead.

3.3. Distribution of glyphs in fonts

The fonts available to me behave as follows with regards to the glyph variants:

(Key: —: Mathematical symbol variant absent; G: Mathematical symbol made identical to Greek letter; M: Greek letter made identical to Mathematical symbol; 0: Adheres to reference glyphs; *: See comments.)

Font	Normal φ:	β	θ	Υ	φ	π	κ	ρ	Θ	ε	Comments
Aisa Unicode	Closed	—	—	—	—	G	—	—	—	—
Alphabetum Unicode	Open	0	0	0	0	0	0	0	0	0
Arial Unicode MS	Closed	0	0	0	0	0	0	0	—	—
Aristarcoj	Open	0	M*	0	0	0	0	0	0	0	Closed theta is not quite closed
Cardo	Open	0	0	0	0	0	0	0	0	0
Everson Mono Unicode	Open	0	0	0	G*	0	0	0	0	0	Both phis open, with the symbol slightly more open
FreeSerif	Open	G	G	M	0	G	0	0	—	—
Galatia SIL	Open	0	0	0	0	0	0	0	0	0
Gentium	Open	0	0	M*	0	—	—	—	—	—	Both upsilons hooked, symbol slightly more so
Hiragino Kaku Gothic Pro	Closed	0	0	M—	0	—	—	—	—	—
Lucida Grande	Open	—	—	M—	0	—	—	—	—	—
Lucida Sans Unicode	Closed	0	0	M*	0	0	0	0	—	—	Symbol phi slightly thinner
New Athena Unicode	Closed	0	0	—	M	0	G	—	—	—
Palatino Linotype	Closed	0	0	M*	0	0	0	M—	—	—	Symbol phi slightly shorter; rho is curled in regular and bold, but not italic and bold italic
Symbol	Closed	—	0	0	0	0	—	—	—	—
TITUS Cyberbit Basic	Open	0	0	0	G*	0	M	M	0	0	Symbol phi somewhat thicker
jGaramond	Open	—	M—	M—	—	—	M—	M—	—	—

This is pretty patchwork, but with the pedigrees of the glyphs in hand, we can make sense of what has gone on with the various fonts.

A few fonts are not that interested in maths, and impose on the mathematical symbols the normal textual glyphs. The font that does this most is FreeSerif, which is an odds-and-ends font in origin; other fonts veer off for only one character.
Pre Unicode-3.2 fonts have stuck with the closed phi as the default: Aisa, Arial, Lucida Sans, Palatino. The other fonts that have done so are either mathematical, where the closed phi is the default (Hiragino, Symbol), or are Classics-specific, where the closed phi is acceptable (Athena, which is an epigraphy-friendly font).
The descenderless beta is little known outside France and Greece, and not much used even within those countries (the French rule is apparently no longer dominant); so it does not appear in current fonts.
The open theta is an acceptable glyph variant for more script-like fonts, but only one font so far, Aristarcoj, has even begun to take that form up. This is partly to be explained by the dearth until now of Unicode Greek italic fonts. Neither Gentium nor Palatino have taken the open form up in their italics, although as I discuss with regard to the IPA, neither font was really in a position to.
The hooked upsilon, as opposed to the other variants, is in mainstream use; as a result, many fonts have taken it up, which in theory makes them ineligible for mathematical use (although it is not clear whether the two capital upsilons are differentiated at all in mathematics).
Most fonts have gone with the Unicode default of the day, and have kept their two phis distinct. Everson has made both his phis open, which I find puzzling; this may be an oversight. Athena has gone with the closed phi despite being relatively new, as mentioned; it is in any case founded on a model that predates Unicode (the 1991 Athenian font).
The curly pi is unknown in modern typography, and in fact little known in general; no font has taken it up for its alphabetic pi.
Like the hooked upsilon, the script kappa is in mainstream use in Greece, but is largely absent from the fonts; I suspect this is because the form is not as popular outside Greece.
The script rho is likewise not in evidence; strangely Palatino uses it in its regular and bold versions (which, alongside its alpha, contributes to it looking indecisive between italic and regular—see Jeffery Rusten's similar comments in his review of the font). Perplexingly, its italic version (which was done by a different designer) avoids the script rho.
The capital theta and lunate epsilon are too new to have been included in many fonts. The barred theta is a marginal glyph at best in Greek, so it has left no trace on fonts. The lunate epsilon is likewise not mainstream, and will only appear when someone chooses to devise an uncial Unicode font.

There are other glyph variants of Greek characters which have not been considered here; for now I will pass them over, pausing only to dangle the following teasers:

Uppercase: delta, xi, psi, omega
Lowercase: alpha, gamma, eta, kappa (cursive), nu, xi, tau, chi, psi

4. Pseudo-Monotonic Capitals

U+1FBA Greek Capital Letter Alpha With Varia [Ὰ]; U+1FC8 Greek Capital Letter Epsilon With Varia [Ὲ]; U+1FCA Greek Capital Letter Eta With Varia [Ὴ]; U+1FDA Greek Capital Letter Iota With Varia [Ὶ]; U+1FEA Greek Capital Letter Upsilon With Varia [Ὺ]; U+1FF8 Greek Capital Letter Omicron With Varia [Ὸ]; U+1FFA Greek Capital Letter Omega With Varia [Ὼ]

These titlecase characters make no sense.

Well they don't! The story is as follows. A titlecase vowel in polytonic Greek could never bear just an accent. It was either the first, stressed vowel of a word, in which case both the accent and the breathing had to go on the vowel; or it was the first vowel of a stressed diphthong at the start of a word, in which case both the breathing and the accent had to go on the second vowel, and the initial capital had no diacritics. It only became possible in monotonic Greek for a capital letter in titlecase to have an accent but no breathing; that is the provenance of U+0386 Greek Capital Letter Alpha With Tonos (and its redundant companion in Greek Extended, U+1FBB Greek Capital Letter Alpha With Oxia—now that the tonos has been ruled to be equivalent to the acute).

If there is to be any point to a character like U+1FBA Greek Capital Letter Alpha With Varia, it requires a monotonic system which has dropped the breathings, but has retained the distinction between grave and acute. The grave in Greek is a positional variant of the acute; so it makes sense to posit that, where one appears, the other must follow. But Greek abandoned the grave in mainstream usage two decades before it abandoned breathings: there has never been a Greek orthography which has graves but no breathings, so there has never been a Greek orthography where U+1FBA Greek Capital Letter Alpha With Varia could appear as a titlecase letter. Haralambous likewise finds these characters "illogical", and makes avoiding them Rule 2 of his Guidelines.

Now, the evidence for these being titlecase letters is circumstantial: the fact that all the other accented capitals in Greek Extended are legitimate titlecase characters, and that the reference glyphs of the characters have the grave in titlecase position. Whoever came up with these characters clearly intended them for a titlecase use that was not to be. (My suspicion is that they date from early in the monotonic reform, when someone had the impression the tonos was going to turn into the acute–grave combination, rather than just the acute. Why they would form such an impression, I have no idea, and I was there at the time. As a primary school student, admittedly.)

There is a usage the characters could be salvaged for, although I doubt it is worth the effort. In the Renaissance through to early 19th century practice of accenting all capitals words, the accents sat on top of the capital letters, as if they were lowercase, or to their right. (If there was a breathing involved, they would often go in titlecase position for expediency, particularly later on.) The precise location of the diacritic is not Unicode's concern; all that matters to Unicode is that a capital letter is postmodified by a diacritic. So the all caps version of καλὸς (which decomposes to U+03BA U+03B1 U+03BB U+03BF U+0300 U+03C2: κ α λ ο ̀ ς ) is ΚΑΛῸΣ (which decomposes to U+039A U+0391 U+039B U+039F U+0300 U+03A3: Κ Α Λ Ο ̀ Σ ).

Now U+039F U+0300 is canonically equivalent to U+1FF8 Greek Capital Letter Omicron With Varia. So there is nothing preventing us from reprecomposing the string as U+039A U+0391 U+039B U+1FF8 U+03A3: ΚΑΛῸΣ, and deciding that the glyph for U+1FF8 should have the grave on top of it, Renaissance-style. We already have almost all the other capital + diacritic combinations in place—breathings, acutes, breathings and accents, adscripts, adscripts with breathing and accents. For all of these the glyphs have the diacritics in titlecase position; all we would need to do is have variant glyphs, selected according to markup or context (markup would probably be safer), where the diacritics ride on top of the capital letter.

And yes, all this is feasible. We are of course missing precomposed capitals plus circumflex. (After all, this is not the function anyone at ELOT had in mind for the precomposed capital + grave, so they would have had no reason to propose an equivalent capital + circumflex.) There is no chance that Unicode would adopt extra codepoints for those precompositions, since the door has been shut on new precomposed characters (the more so for characters in such marginal use). So we would need to come up with a solution for circumflexes as well, which would involve a precomposed capital + circumflex glyph, but no codepoint: it would end up as a 'ligature' of the two codepoints.

And this is where the plan to make lemonade out of the capital graves comes unstuck. If we can get a solution to work for Renaissance-style Greek involving the separate codepoints for capital vowels and combining circumflex, the same solution will work for capital vowels and combining acute, and capital vowels and combining grave. But in that case, there is no need to involve the precomposed codepoint at all; the solution doesn't rely on the existence of the precomposed codepoint to work, and so doesn't provide it with a reason to exist.

So these characters still make no sense.

Nick Nicholas, opoudjis [AT] optusnet . com . au
Created: 2003-09-16; Last revision: 2008-05-14
URL: http://www.opoudjis.net/unicode/letters.html