1. Greek & Greek Extended

Home > Greek > Unicode

Up until Unicode version 4.0, the Greek script took up two blocks of Unicode: Greek (U+0380 - U+03FF) and Greek Extended (U+1F00 - U+1FFF). The Greek block is adequate for writing Greek in the contemporary monotonic accentuation system; Greek Extended is used for writing Greek in the traditional, polytonic system. (It is not essential for it, as it uses decomposable characters, but it is usual online.)

Implementors will understandably want to know how crucial it is to support Polytonic Greek, and who needs to use it. Yannis Haralambous gives a tendentious summary in his Guidelines and Suggested Amendments paper, and is even more tendentious elsewhere ("Needless to say that the author considers the «monotonic» spelling reform as a crime against Greek language, a crime commited by populistic politicians and negationist pseudo-linguists": §1). This is my no less tendentious summary.

Being a Negationist isn't all the bad; doesn't necessarily connote you're a Holocaust denier, after all.

1.1. The Polytonic System

Ancient Greek was originally written with no spaces, accentuation, or lower case. Fine if you already know Ancient Greek; but once the language was starting to change into Hellenistic Greek, and students still had to recite Homer correctly, the grammarians of Alexandria started inserting diacritics, to indicate the proper pronunciation of the words. This was especially crucial, since the distinctions the diacritics indicated were already fading away in the spoken language; the breathing marks have had historical value only for the past 2200 years, the distinction between accents for the past 1900, the subscripts were already dropping out in Classical times. These diacritics have established English (Latinate) names, but Unicode used a Modern Greek standard, so it uses the Modern Greek names of the diacritics. The diacritics are:

  Symbol Unicode English Name Modern Greek Name Function
Accents   ̀ U+0300 Grave Varia Neutral/Low pitch accent
  ́ U+0301 Acute Oxia High pitch accent
  ͂ U+0342 Circumflex Perispomeni Rising-Falling pitch accent
Breathings   ̔ U+0314 Rough (Asper) Daseia Presence of [h] before vowel; voiceless r [r̥]
  ̓ U+0313 Smooth (Lenis) Psili Absence of [h] before vowel; voiced r
    ͅ U+0345 Iota Subscript/ Adscript Ypogegrammeni/ Prosgegrammeni [i] after long vowel
    ̈ U+0308 Diaeresis Dialytika Two vowels forming distinct syllables rather than diphthong
Length   ̄ U+0304 Macron Makron Long vowel (where vowel is ambiguous)
  ̆ U+0306 Breve Vrachy Short vowel (where vowel is ambiguous)

To clarify:


Ancient Greek was a pitch accent language: not tonal like Chinese, but with rising and falling of the voice on accented syllables like Serbian, Croat, or Swedish. Greek switched in late antiquity to a stress-based system such as it has today (where accented vowels are merely louder, with no inherent variation in pitch). When this started happening, people learning Greek literature and poetry in particular had to be reminded of where the pitches (used to) be; the grammarians of Alexandria undertook this task. The acute indicated high pitch, the circumflex rising then falling pitch. (As a result, the circumflex is only allowed in Classical Greek on long vowels: there wouldn't be the time to have a pitch rise then fall on a short vowel.)

Initially, the Alexandrians wrote grave, indicating low or neutral pitch, on every unaccented syllable. Eventually, grave was restricted to accented syllables, as a variant of the acute. The grave is the default form on the final syllable of a word, rather than the acute, indicating that the pitch normally dropped off there. However, an acute was used instead of a grave before punctuation (pitch would remain high before a pause), and before enclitics (little words which would glue on to their preceding words, and form a single phonological unit with them -- so that the accent was no longer truly final).

The acute is also used when the word is presented in isolation, in linguistic discussion. In that case, the grave appears occasionally on the final syllable to disambiguate an unaccented word from an accented word; e.g. τίς "who?"; τις or τὶς "someone".

Further detail is given in Smyth's grammar.


Ancient Greek as we know it only had h at the start of words, before vowels. In some parts of Greece, there was a separate letter for h (it was, in fact, Η, which is how it came into Latin). Other parts of Greece did not have a separate letter for h; and once Greeks started dropping their aitches (earlier some places, later elsewhere, but pretty much everywhere by 200 BC), they needed to indicate where the h's used to be. The Rough breathing indicated a preceding h, the smooth indicated its absence. The breathing is placed on the initial vowel, or on the second letter of an initial diphthong. (In fact, whether the breathing goes on the first or second of two vowels in a row is diagnostic of whether the two vowels count as a single syllable or not.) In Classical Greek, initial r was obligatorily silent: it sounded like hr --- or, as the Romans transcribed it, rh; this was indicated by placing a rough breathing over the rho as well.

If a word starting with h was prefixed, it appears the h was still pronounced: thus, 'council' was syn + hedra > synhedrion, hence Hebrew sanhedrin. But conventional Greek spelling did not insert breathing marks on vowels in the middle of words; thus, συνέδριον, not συνδριον. (Very occasionally, one will see such "internal breathings" in transliterations of foreign names; e.g. Ἀβραμ for Abraham.) The same held for initial rho, and in some typographical traditions the resulting -rrh- cluster is written with a smooth breathing over the first rho, and a rough over the second: thus, Calirrhoe: Καλιρρόη or Καλιῤῥόη. This is the only context in which rho with a smooth breathing is seen, and not all traditions use it: it is somewhat old fashioned even for Classical Greek.

Although Greek does not conventionally allow internal breathings, both Ancient and Modern Greek (though monotonic Greek only vestigially) do allow the coronis, a sign marking that two words have been merged into one, and thus akin to an apostrophe; e.g. ἐγὼ οἶδα > ἐγδα, τοῦ ἔδωσε > τοὔδωσε. The coronis is semantically distinct from the breathing, and has distinct codepoints (U+0343 Combining Greek Koronis, (spacing) U+1FBD Greek Koronis). However, it looks identical to the breathing; it is in complementary distribution with it; and the codepoints decompose canonically to the same diacritic used for smooth breathing, U+0313 Combining Comma Above. So Unicode, correctly, treats the coronis and the breathing as the same thing; this is consistent with Unicode policy not to differentiate between different semantic uses of visually identical diacritics. For more information on the coronis, see the character Story.

Iota Subscript

Diphthongs in Greek notoriously monophthongised, with the result that Greek now has a dozen ways of spelling [i]. If the diphthongs involving i were short, they usually ended up pronounced—at least initially—with a vowel halfway between the two vowels in the diphthong (ai > e, oi > y > i, ei > e > i, yi > y). If the diphthongs involving i were long, however, the i simply dropped out; and this was already happening very early on, by 400 BC. So the scribes who were tidying up Greek spelling were faced with a lot of seemingly "silent i"'s. Rather than leave the silent i out completely, though, they simply shrunk it and tucked underneath the letter, as an iota subscript. (On the case distinction between adscript and subscript, see below.)

Thus, Classical ηι (eːi) became written as ῃ, and classical ωι (oːi) as ῳ. (If an edition is reproducing an Ancient inscription or papyrus, which predates the subscript, it will preserve the "adscript" as ηι and ωι.) Alpha was ambiguous between a short and a long vowel. If αι represented a short diphthong (which ended up pronounced e), it stayed written αι; if it represented a long vowel (aːi), it became written as ᾳ.


In Classical Greek, a diphthong by definition constituted a single syllable. There were occasional exceptions to this. One was because an unwritten h was intervening between the two vowels (προϋπάρχω = προπάρχω prohyparchô rather than προυπάρχω prouparchô). The other was because the text was in a dialect other than Classical Greek, which did not pronounce the two vowels in a single syllable, particularly where meter was involved (i.e. Homer): γένεϊ ge-ne-i rather than γένει ge-nei. To indicate this syllable break, the diaeresis was introduced.

Once diphthongs were monophthongised in Middle Greek, which retained Classical orthography, the diaeresis became useful to indicate that the word in question (often foreign) did not pronounce the digraph as a single vowel, but as the erstwhile diphthong. Thus in Modern Greek, παιδάκι [peðaki] 'child' < Hellenistic παιδίον [paidion], but παϊδάκι [paiðaki] 'cutlet' from Hellenistic παγίδιον [pagidion].


Although the length marks are just as old as the other Greek diacritics, they never became part of normal Greek orthography: they are instead the province of Modern philology. They are used (almost always in teaching or dictionaries) to disambiguate the three vowels which might be long or short: alpha, iota, and upsilon. They are not used in running Greek text, and will not normally be required to print an Ancient text. A long vowel is indicated by a macron, and a short vowel by a breve.

1.2. The Monotonic System

The diacritics were only introduced when they were no longer being pronounced distinctly in the language; Greek spelling since Alexandrine times has been historical, and thus a matter of rote memorisation. Adapting the traditional system to the vernacular language proved quite a challenge (requiring the development of postclassical Greek philology), although a standard had been worked out by the early twentieth century. Modern Greek diglossia, under which the State language was based on archaic forms of Greek, served to complicate matters further.

There have been tentative attempts at spelling reform since the start of the nineteenth century (Yannis Vilaras' Η Ρομεηκη Γλοσα "The Romaic Tongue", from 1814, has near-phonetic spelling and no accents). Given the political ramifications of language issues in Greece, reform has always been controversial; notoriously, for example, the classicist Ioannis Kakridis was dismissed from Athens University for using a reformist accentuation system, in the "Trial of the Accents"—which took place in the middle of the German Occupation. The issue remains contentious, and Haralambous' at times hyperbolic comments are indicative of how the traditionalist camp feels.

In this site, whenever I refer to Greece, I imply "and Cyprus". I know this is terribly patronising of me, especially when I'm half-Cypriot; but Cyprus hasn't really had the opportunity to develop a typography or language policy substantially independent of Greece. Diglossia was nowhere near as contentious an issue in Cyprus as in Greece, since both Puristic and Standard Vernacular Greek were foreign languages to Cyprus (which thereby ended up with a tidy triglossia with the local dialect); and it has been possible to write scholarly work in Cyprus in Puristic long after this marked one as ideologically suspect in Greece. This means that orthographic reform would not have been as pressing a political imperative in Cyprus as it was in Greece; but Cyprus isn't in a position to retain polytonic where Greece is abandoning it (and thus makes the infrastructure for polytonic harder to access).

The first reforms which became entrenched in the '60s: the grave was abandoned, as were breathings marks on rho, and the use of subscript was curtailed in vernacular forms. This simplified polytonic is still often seen, even amongst the traditionalists who have rejected subsequent reforms; and while many traditionalists write Modern Greek with graves, very few write it with breathing marks on rhos. (Even Haralambous refers (§1.3.1) to its use in Modern Greek as "overzealous").

The current system is the monotonic system, so called because it replaces the multiple accents of the traditional system (polytonic) with a single accent. This became the official writing system of Greek in 1982; it had been long anticipated in the typography of the press, and among individual reformers, becoming more visible in the '70s. The monotonic system also abolishes notation of breathings. Iota subscript is marginal in the modern language, being restricted to ossified dative expressions; the subscript is usually left out in the monotonic as well.

Formerly the subscript was used productively in the subjunctive ending -ῇ; however this ending lost its subscript in most usage, becoming -ῆ, and in the Standard it has been conflated with the homophonous indicative -εῖ; even the -ῆ spelling has not been seen in the contemporary language for the past three decades.

There clearly remain individuals and organisations in the Greek-speaking world opposed to the monotonic system or unwilling to change the orthography they were brought up with; and some publishing continues in the polytonic. The polytonic has been abandoned, however, in the mass media, and is absent from the education system (outside of instruction in Ancient Greek.) The use of monotonic on the Internet has cemented this: there has not been a standard computer encoding for polytonic before Unicode, and this is a situation that proven aggravating to classicists, but debilitating to traditionalists: until quite recently, the only way to disseminate polytonic texts online was by pdf or gif. Polytonic was thus effectively locked out of the Web in its formative stages, and the constituency using the Web was mostly unconcerned with polytonic anyway.

The traditionalist camp has reasserted itself since the reforms of the '80s, so there has been talk of a resurgence of polytonic. But with a generation of Greeks now not exposed to polytonic before high school, it is quite implausible that polytonic will ever stage a comeback and beat monotonic back, even though it will retain a niche presence for Modern Greek, and remain indispensible for Ancient and Byzantine Greek.

Mediaeval vernacular Greek texts are starting to appear in monotonic—notably in the editions by George Kechagioglou, and Emmanuel Kriaras' dictionary of the vernacular; but this is still the exception rather than the rule, and is largely unwelcome in the field. Adapting the monotonic system to the macaronic language of mediaeval vernacular texts itself forces adaptations to the system.

Further orthographic reform seems unlikely, though the bogeyman of romanisation is trotted out occasionally by traditionalists. (Romanisation has had a comfortable niche in Catholic-ruled Renaissance Greece and on the Internet since the '80s, but not beyond those two domains.) Television routinely uses unaccented text, in news subtitles (though not foreign film subtitles), and credits; this is as much an affectation as anything else, which also surfaces in display type, but does not appear to have been generalised. The mass media, including the press, routinely drops off the accents on capital letters, and has been doing so for years. (Note that the combination of acute and capital in titlecase is peculiar to monotonic, and may have aroused aesthetic objections.) This deviation too does not seem to have spread to the education system or to other typography.

1.3. Oxia vs. Tonos

In the early years of the monotonic system, particularly when reformers wished to differentiate their system from the polytonic, the tonos (accent) on letters was a novel sign: typically a dot or wedge. The tonos which Unicode 1.0 used to recommend for the Greek accent (U+030D Combining Vertical Line Above), and which one can still see on many fonts that followed that description, reflects those times—although the glyph Unicode promulgated has an excessively long stroke by the standards of what was current. However, the Greek government decreed in 1986 that the tonos shall be the acute. Accordingly, Unicode 3.0 onwards decomposes the monotonic accented characters of the Greek range into combinations of the letter and the Western acute, U+0301 Combining Acute Accent, even though the names of the characters still refer to tonos (e.g. U+03AC: Greek Small Letter Alpha With Tonos, decomposed as U+03B1 U+0301.)

It would be an exaggeration to say that the erstwhile dots and wedges have completely died out—especially as they have been given a new lease of life by font developers' sluggishness. However, the non-acute tonos seems to have become restricted to display type or otherwise marked circumstances; quality typography uses the acute.

While a dot, wedge, or stroke is tolerable as a monotonic accent (though it is no longer officially sanctioned), it is unacceptable as a polytonic accent. So if a user is using a font which has a tonos distinct from the acute, and types polytonic Greek, conflating the oxia and tonos will lead to unacceptable results. If a font has a non-acute for U+03AC Greek Small Letter Alpha With Tonos, it should definitely have an acute for U+1F71 Greek Small Letter Alpha With Oxia --- and the user should keep them distinct. This is fighting a losing battle, however, since U+1F71 itself canonically decomposes to U+03AC, and vendors are within their rights to conflate them. (The GreekKeys Unicode keyboard, for instance, made no distinction between acute and tonos in its earlier versions.) The real solution is simply to avoid any font for polytonic Greek where the tonos is not an acute: the user has no guarantee that the tonos and the acute will not in fact be conflated, as Unicode officially requires.

The polytonic fonts with a tonos other than the acute are:

1.4. Titlecase

Case is an infrequent phenomenon in the world's scripts: with Modern Georgian having long ago abandoned its mediaeval use of case, case is now limited to Greek, Latin, Armenian, and Cyrillic. Greek differs from Latin in that it capitalises letters with diacritics differently, depending on whether the entire word is in capitals (whereupon diacritics are eliminated), or the initial is capitalised only, as in the first word in a sentence or in a title (whereupon the diacritics are retained, although they appear to the left of the letter rather than above it.) Thus, polytonic ἄνθρωπος capitalises to ΑΝΘΡΩΠΟΣ, but in titlecase to Ἄνθρωπος; monotonic άνθρωπος capitalises to ΑΝΘΡΩΠΟΣ and Άνθρωπος.

This kind of distinction arises elsewhere in Unicode, particularly when it is stuck with digraphs for backward compatibility purposes. So Unicode distinguishs between Uppercase and Titlecase equivalents of lowercase letters. For example, we will see that Serbian programmers decided to make of lj a single character, U+01C9 Latin Small Letter LJ, lj. The uppercase version of this character, which you would use in All Caps, is U+01C7 Latin Capital Letter LJ, LJ: LJUBLJANA. But the version of the letter you would use in Titlecase, at the beginning of a capitalised word, is U+01C8 Latin Capital Letter L With Small Letter J, Lj: Ljubljana.

The behaviour of All Capitals in Greek has not always been the same: in the Renaissance, and in the French typographical tradition up to the 19th century, all capital words retained their diacritics, with accents over the letter rather than to its left. Very occasionally a disambiguating accent will appear in the all caps text of a comic strip or newspaper headline. (The same occurs for the ad hoc romanisations of Greek online; the surrogate accent used in the latter is usually an apostrophe.) The most frequent case is Ή 'or' as distinct from the feminine article Η; I have only seen the accent to the left, per titlecase, and given that the word is only one letter long this is not surprising. Comic strip writers may disambiguate other words with an accent (typically a dot rather than an acute!) above the letter, making them analogous to lowercase accentuation (and Renaissance practice); fixed modern fonts prevent much typography from exploiting the same luxury. Contemporary examples are solicited.

Nick Nicholas, opoudjis [AT] optusnet . com . au
Created: 2003-05-26; Last revision: 2004-10-09
URL: http://www.opoudjis.net/unicode/unicode_gkbkgd.html