Sigma: final vs. non-final

Nick Nicholas, Ph.D.
Thesaurus Linguae Graecae
University of California, Irvine
opoudjis [AT] optusnet . com . au
Draft: 2004-10-09

The overall rule on final versus non-final sigma is simply that, where the sigma terminates what may be understood to be a distinct word of Greek, it is final, otherwise, it is non-final. In the following I outline what I consider best practice; as an appendix, I give cases where this is likely to go awry.

1. Existing Practice: TLG

The Thesaurus Linguae Graecae project currently employs an ASCII encoding of Greek known as Beta Code for its texts. This encoding employs a single character to represent both medial and final sigma (S). The algorithm used by the TLG to resolve final vs. non-final status was not codified until fairly recently, and software using Beta Code seems not to be aware of it (witness Pandora for the Macintosh.) Nonetheless, it is adequate for most of the texts at the TLG, and may form the basis for any algorithm to resolve sigma status. It is simply this:

2. Single character

When the single character sigma appears with no preceding or following characters or symbols, it is to be treated as non-final. This is because the non-final is the unmarked sign, and hence the sign by which the letter is known. There are two major contexts this can occur: sigma as the numeral 200, and sigma as a phoneme or letter. In the case of the numeral, sigma can only occur as a non-final; the final would be readily mistaken for stigma '6' (Ϛ ϛ). This also applies when the numeral is pre- or post-modified by a numerical signifier: σʹ 200, ͵σ 200,000, σ̈ 2,000,000, Μσ [meant to be capital Mu with small-case sigma on top of it] 2,000,000.

Instances of non-final sigma on its own as a phoneme or letter are also abundant: the following are characteristic examples:

προφέρεσθαι δὲ δίκαιόν ἐστιν ὑμᾶς σὺν τῷ σ σῦς ἐτυμώτερον· παρὰ τὸ σεύεσθαι γὰρ καὶ ὁρμητικῶς ἔχειν τὸ ζῷον εἴρηται. τέτριπται δὲ καὶ τὸ λέγειν χωρὶς τοῦ κατ’ ἀρχὰς σ ὗς.
And it is proper for you to pronounce σῦς 'pig' with the initial σ as more authentic; for the animal is so called for being aggressive and σεύεσθαι 'quick'. It is also common to name it without the initial σ, as ὗς. (Athenaeus, Deipnosophists 9.64)
οὐ γὰρ προτάττεται τὸ σ τοῦ ξ κατὰ συνεκφορὰν τὴν ἐν μιᾷ συλλαβῇ γινομένην· δεῖ δὲ τοῦ σ σιωπῇ καταληφθέντος τότε ἀκουστὸν γενέσθαι τὸ ξ.
For σ does not go before ξ when they are pronounced together in one syllable. The ξ can only be heard when the σ falls silent. (Dionysius of Halicarnassus, De compositione verborum 22.245)
This behaviour would presumably also be that required in mathematics.

3. Followed by visible dash (em-dash, en-dash)

When the visible dash delimits entire orthographic words, the sigma is final. This will be the case for the em-dash (which is punctuation), and the en-dash (inasmuch as it connects words rather than morphemes.) For example, contemporary Greek has borrowed from French the strategy of forming compounds from two independently declinable words: the Greek for 'keyword' is λέξη–κλειδί, of which the plural is λέξεις–κλειδιά. The only instance I could find in the TLG corpus was from the 15th century historian Ducas, in Turkish names: Τζαλίς-πεγι Çalış-Bey (cf. the common Greek fort name Ιτς-Καλε < İç-Kale.) The en-dash is occasionally used in Byzantine texts in the TLG corpus; it is not used to my knowledge with Classical texts.

3. Followed by hard hyphen (U+002d)

On the other hand, when the hyphen is used to indicate syllable or morpheme breaks, the word is not complete, so the sigma is not final. For example, introducing its listing of words beginning with the stem χερσ, the Liddell-Scott-Jones dictionary begins with χερσ-άβροχος. Similarly in Modern Greek, one would 'syllabise' (συλλαβίζω, give the syllables of) προσλάβω 'hire' as προσ-λά-βω.

The problem here, of course, is that hard hyphen will be used both in these cases and when a connection between independent words is meant. My inclination would be to make final sigma the default: en-dash compounds are commonplace words of Modern Greek, whereas 'syllabising' properly belongs to a technical domain (though it is also used to render emphasis in informal use: α-προσ-δό-κη-το! that's *so* unexpected.)

5. Followed by punctuation

The behaviour of sigma depends on whether the period acts as punctuation or as an abbreviation marker. In the former case, the word is complete, so the sigma is final. In the latter, the word is incomplete, so the sigma is non-final. Thus:

ἦτο φιλόσοφος.
Ήταν φιλόσοφος.
He was a philosopher.
Φιλόσ. τίς ἦτο;
Φιλόσ. Ποιος ήταν;
Philos. Who was he?

6. Followed by whitespace

Since whitespace is universally a word delimiter in Greek, sigma before whitespace is always final. The only field where the status of whitespace is ambiguous is papyrology and epigraphy, in which the ends of words are frequently conjectural due to degradation of the document (as well as the failure of Ancient writing to use whitespace to delimit words.) As a consequence of this, papyrologists in particular tend to shy away from non-final and final sigma in general, and use lunate sigma (which has no final variant) instead.

7. Followed by letter

Since sigma followed by a letter necessarily implies it is not the final letter of the word, such instances are always medial.

8. Followed by non-spacing diacritic

There are two contexts in which sigma may bear a diacritic. The first is in epigraphy and papyrology, where an underdot (U+0323) denotes a dubiously transmitted letter. (To these one might add Ancient editorial signs, equivalent to the modern strikethrough.) The second is in Modern Greek dialectology, where diacritics indicate a change to the pronunciation of σ: most typically σ̌ (U+030c) [ʃ], less frequently σ̔ (U+0314) [ʃ], σ̓ (U+0313), σ͂ (U+0342) [ɕ] (Tsakonian).

In the case of Modern Greek dialectology, the processes giving rise to variant pronunciations of /s/ are usually not word-final, so it is rare for such diacritics to appear on a final sigma. There are only two real exceptions:

  1. The dialect borrows a word from another language (Turkish, Russian), without adjusting it to Greek morphology. As a result, the word necessarily ends in [ʃ]. This is the case for Cappadocian and Mariupolitan, and in that dialect final accented sigmas routinely appear with diacritics. The following are examples:

    1. Mariupolitan: ζαντάνιις· ιέσλι βίπολνις̌, ζνάτσ̌ιτ "requests; if you manage them, then..." (Russian выполнишь) (Ashla 1999:123)
    2. Cappadocian (Ghurzono): σ̌ίς̌ 'skewer' (Turkish şiş) (Dawkins 1916:677)
    3. Cappadocian (Ulagaç): bόρτς̌ 'obligation; (Turkish borç) (Kesisoglou 1951:101)

  2. The dialect drops the final vowel conditioning the change to the penultimate sigma. In that case, normally, an apostrophe is used; so in Pontic, for example, Papadopoulos' (1961) dictionary (which represents mainstream, historical orthography) spells 'chicken coop' as πινέσ̌' and πορέσ̌' (s.v. πονέσ̌ιν.)

    Some authors however drop the apostrophes; and in that case the newly final sigma can be medial rather than final — the rationale being, presumably, that the sigma is not 'really' final. For example, the Pontic edition of Asterix and the Roman Agent (Goscinny 2000) spells 'chicken coop' as πινέσ̌. And Mouratidis (1991) vacillates in his transcription: the same page contains the spellings Πάς̌-ἰστινέ and πάσ̌' ἰστινέ (p. 50) (Turkish baş-istine). As a counterexample, Dawkins (1916:641) spells the Cappadocian reflexes of ράχις [raxis] > [raxi] > [raʃi] > [raʃ] 'back' with a final sigma: ράς̌, τρές̌.

    All the same, the data from Greek transcriptions of Albanian (see Appendix) seems to indicate there is some reluctance to place diacritics on a final sigma. This is presumably an aesthetic judgement; but there is also precedent for final sigma with diacritic, and since the diacritic is not an alphabetic character (not 'part of the word'), it would probably be least surprising if Unicode also ignored diacritics in determining finality.

9. Followed by soft hyphen

Since a soft hyphen indicates that the word is not yet complete, sigma before it is necessarily medial. Furthermore, a line break in a word not occasioned by insertion of a soft-hyphen still should not force a change to final sigma at line end).

10. Followed by numeral

A sigma followed by an Arabic numeral would be outside the range of Classical and Byzantine Greek, which did not use Arabic numerals. In Modern Greek, I know of no attested such usage. Common sense dictates that the sigma would behave as if the one were absent. In program variable names, for instance (which will be possible to write in Unicode shortly, though I am not aware of such usage to date), a variable s1 would end up as σ1, since sigma on its own is medial; a variable client1 would end up as πελάτης1, since it incorporates a complete word; and subst1 as ουσ1, since without the numeral the stem would be considered an abbreviation. In the absence of any established procedure that I am currently aware of, it is safer to treat these sigmas as medial.

11. Followed by apostrophe

An apostrophe is treated as a letter for the purposes of determining finality, since it stands in for an elided letter. It follows that σ' and 'σ' are medial, and 'ς is final. All three may be seen frequently in Early Modern Greek texts: 'σ' as a truncation of εἴσαι 'you are', σ' as a truncation of σέ 'to', and 'ς as the same word, but analysed instead as a truncation of the Ancient εἰς 'to'.

Appendix: Exceptions

The following exceptions are deviations from the orthographic norms of Greek, and are included here for completeness.

2. Single character

Instances of final sigma on its own are rare. One such case, of course, is when the grapheme itself is discussed.

In the TLG corpus, final sigma on its own occurs 60 times in 10 works. As it turns out, the only instances not erroneous for sigma or stigma (and since corrected!) are three word fragments in the Scholia to Thucydides. (In the Geoponica, Aëtius' Iatrica vol. 12, and Theon of Smyrna, it is used in the nineteenth-century edition as a typographical rendering for the S-like Ancient symbol for half; but this symbol is usually printed as a Roman S.)

6. Followed by whitespace

There are two cases when a Greek word may end in a medial sigma: when it's not really Greek, and when it's not really ending in a sigma.

Not really Greek

When Greek is used to transcribe other languages (Albanian, Arumanian, Turkish, Romani), the rules on final sigma are often discarded. Looking at transcriptions of Arvanitika, the variant of Albanian spoken in central Greece,

What seems to emerge is that people transcribing non-Greek into Greek do not feel obligated to obey the final sigma rules for single sigmas, and are outright disinclined to do so for combinations of sigma alien to Greek (sigma with diacritic, double sigma; tau-sigma is allowed word-finally in onomatopoeias, on which see below.) The latter is consistent with the tendency to use medial sigma with diacritic noted for Greek dialectology.

Not really a word

The cluster τσ is not normal word-finally in Greek; it occurs only in indeclinable loan-words (e.g. ιτς 'not at all' < Turkish hiç), and onomatopoeias (e.g. πριτς 'raspberry; no way!'). These are normally spelt as regular Greek words; thus, a search in yielded 36 instances of ιτς (mostly referring to Serbian football coaches: "-ić's"), and 4 of πριτς, but none with the medial sign (although the acronym ΙΤΣ was frequent.) alltheweb yields 19 instances of χρατς 'scratch' and 3 of χρατσ—as it turns out, all three are in all-caps (ΧΡΑΤΣ), so none were truly medial.

Very occasionally, such a word is not felt to be Greek, and thus the rules are disobeyed. The instances of this are so rare though (one instance of ματσ-μουτσ 'kiss', at and two of ματσ 'football match' at and, against no less than 6328 instances of the string ματς — and 9 of μουτς) that it can be safely ignored.

Not really ending in a sigma

In South-Eastern Greek dialects, gemination occurs across word boundaries, triggered by an etymological word final nu. For example, τον πήρε 'he took him' /tonpire/ > [toppire]. This is indicated orthographically by substituting the nu with the geminating consonant: τοπ πήρε. When this occurs with sigma, one will occasionally see the phonetic sigma in final form: e.g. πκοιὸς τὸς σώννει (standard ποιος τον σώνει) 'who will manage... him' (Hatzioannou 1934-37:631). However, the usage exemplified in the following is almost universal: Ἀδικουχτισμένουσ σπίτιν π̔έφτει κὶ gριμμίν̣ν̣ιτι (standard αδικοχτισμένο[ν] σπίτι[ν]) "an unjustly built house falls and collapses" (Mousaiou-Bougioukou 1961:315) The rationale here, presumably, is that sigma is final only at the phonetic level, and not at the phonemic level.

Similar reasoning applies in those dialects where an /e/ is appended to a word which, in standard Greek, ends in /s/. This is the case in Cretan and Maniot with pronouns like /mas/ (Cretan /mase/) 'us'; it is more frequent in Chiot, and regular in Calabrian Italiot—although the latter dialect is typically written in Roman script. Since the word is 'underlyingly' considered to end in /s/, orthographically that /s/ is treated as final, with the added /e/ optionally hyphenated. Thus, Standard Greek μας, corresponding to Cretan μας-ε or μαςε.

7. Followed by letter

There was a somewhat widespread practice in the nineteenth century of using final sigma word-medially to indicate a morphemic break. Most of the TLG's use of forced final sigma is due to this. This practice seems to have been limited to scholarly writing (information solicited), and has not been continued. For example, the TLG text of Plutarch's Παροιμίαι αἷς Ἀλεξανδρεῖς ἐχρῶντο (dated 1839) has the spellings δυςκατανοήτων, δυςκληρούντων, δυςχείρωτοι, ἐπειςήγαγον, προςαγορεύεται, προςεδέξαντο, προςεδόκησαν, προςήκει, προςοφείλων, ὥςπερ. In each case, the final sigma marks a morpheme boundary.

The phonetic Greek orthography implemented by the Soviet Union in the 1930s used final sigma in all positions, and in digraphs (ςς = /ʃ/.) To my knowledge, this practice has been abandoned: Mariupolitan Greek is now written in Cyrillic, though I do not have information on current writing in Pontic Greek in the former Soviet Union. In academia, such texts are typically transcribed into conventional Greek orthography. By way of illustration, the following is the first line of Fotiadis' translation of the Iliad, in the original and in normal monotonic:

Τραγοδ' θεα, τον φοβερον θιμον τυ Αχιλὲα,
ςι Αχείον το κιφαλ πολα κακα, πυ ένκεν,
Παλικαρίον π' έςτιλεν ςον Αδιν πολα πςςία
κε με τα λέςςια χόρταςεν όρνεα κε θερία

Τραγώδ' θεά, τον φοβερόν θυμόν του Αχιλέα,
σι Αχαιίων το κιφάλ' πολλά κακά που ένκεν,
Παλλικαρίων π' έστειλεν σον Άδιν πολλά ψ̌ήα
και με τα λέσ̌α χόρτασεν όρνεα και θερία

9. Followed by soft hyphen

Texts with the morphemic use of final sigma mentioned under (2) will likewise use that final sigma when it falls before a soft hyphen. The work by Plutarch mentioned there contains the following passage:

τραγικαῖς σκηναῖς ἐξαρτῶνται, θεοῦ μιμούμενοι ἐπιφά-
νειαν, ζωστῆρσι καὶ ταινίαις κατειλημμένοι. ἐπὶ τῶν προς-
φανέντων αἰφνιδίως καὶ ἀσχημονούντων. (Plutarch, Παροιμίαι αἷς ᾿Αλεξανδρεῖς ἐχρῶντο 2.16)


For Classical works, see TLG bibliography: