SDÁ 1-2010: Lene Antonsen ja Trond Trosterud

Manne dihtor galgá máhttit grammatihka?

Lene Antonsen (Romssa universitehta)
Trond Trosterud (Romssa universitehta)

Viečča artihkkala dás (pdf).

Why the computer should know its Sami grammar

Language technology constitutes the foundation for the necessary infrastructure needed for any language to function in a modern literary society. The Sami languages differ from the languages for which most such technology is developed in two important ways: The body of text available (either Sami or bilingual Sami – majority language) constitutes but a fraction of what is available for Western European state languages, and the Sami languages have morphological structures far more complex than the ones for most of the Western European state languages.

The article argues that the answer to this challenge is to build a grammar-based language technology for the Sami languages, and presents ongoing work fulfilling this goal. It is shown how morphophonological processes and inflectional and derivational morphology may be modelled as finite-state transducers, and combined with a syntactic component consisting of context-sensitive constraint grammar rules, to constitute a robust grammatical analyser capable of both analysing running text, and generating any word form. The speech communities of the Sami languages are not large enough to uphold a language technology industry, but the grammar-based language model is interesting for theoretical linguists as well.

Practical applications derived from the basic grammatical analysers include spell-checkers, interactive computer-assisted language learning programs, and machine translation.