Jump to main content

Language Technology

This course is part of the programme
Language and literature in the digital world

Objectives and competences

The course objectives are to give:
an overview of language technology, related topics in information theory, text copora for Slovenian language and corresponding tools, basic understanding of the structure of web pages, the relevant markup languages such as HTML and XML.
Students get the competence in evaluation of electronic language resources, in preparation of language-related reports for the web environment.
They learn a new approach to the possibilities in solving language problems, an approach offered by contemporary, web-based time.

Prerequisites

The course does not require any special skills or knowledge, not covered by previous education of a future linguist. All that is needed is basic knowledge of computer use, some experience in usage of web resources and, last but not least, reasonable command of English language.

Content

• Overview of the field of language technology
• Basic web skills
• Overview of markup languages such as HTML and XML
• Text corpora and related tools, especially for the Slovenian language
• Term paper in the form of a web page with statistical analysis of a chosen Slovenian or English fiction text, including its lemmatization and preparation of a dictionary of open-class words.

Intended learning outcomes

Students learn how to use a modern tool for text analysis and its potential in testing of linguistic hypotheses. They understand the inner structure of simple and machine-generated web pages, they get an overview of Slovenian language corpora and their use. Students learn how to make a statistical description of a given text, including the preparation of the frequency dictionary of open-class words.

Readings

  • D. Jurafsky, J. H. Martin, 2009. Speech and language processing, 2. izdaja, Prentice Hall, 1024 str. Catalogue
  • C. D. Manning in H. Schütze, 1999. Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA, 620 str. Catalogue
  • A. Witt in D. Metzing (Ur.), 2010. Linguistic Modeling of Information and Markup Languages, zbirka Text, Speech and Language Technology, Vol. 40, Springer, 266 str. E-version
  • G. Leech, P. Rayson, A. Wilson, 2001. Word Frequencies in Written and Spoken English: based on the British National Corpus. Longman, London, 320 str. E-version
  • Prispevki s konferenc Association for Computational Linguistics (ACL) E-version
  • ACL wiki E-version
  • V. Gorjanc, 2005. Uvod v korpusno jezikoslovje. Izolit, Domžale, 163 str. Catalogue
  • P. Jakopin, 2002. Entropija v slovenskih leposlovnih besedilih. Založba ZRC, Ljubljana, 208 str. Catalogue

Assessment

Term paper in the form of a web page, its presentation (60 %), oral exam (40 %).

Lecturer's references

Asst. prof. Boris Kern is a research associate at the Fran Ramovš Institute of Slovenian Language of the ZRC SAZU. He lectures at the School of Humanities at the University of Nova Gorica as an external associate. He is the co-author and editor-in-chief of the main dictionaries of the modern Slovenian language: eSSKJ: Slovar slovenskega knjižnega jezika and ePravopis (until 2019), as a co-author he also participated in Slovar novejšega besedja slovenskega jezika and the second, revised edition of Slovar slovenskega knjižnega jezika. He is the head of the group for derivational morphology and for normative studies in the preparation of the eSSKJ. His research areas are: word formation, lexicology, lexicography, orthography, phonetics of the Slovenian language and Slovenian as a second and foreign language.

Selection of articles:

• KERN, Boris. Kombinatorika priponskih obrazil v besedotvornih sestavih glagolov čutnega zaznavanja. V: KRAJNC IVIČ, Mira (ur.), ŽELE, Andreja (ur.). Pogled v jezik in iz jezika : Adi Vidovič Muha ob jubileju, (Mednarodna knjižna zbirka Zora, 133). Maribor: Univerzitetna založba / University Press. 2020, str. 67–79. [COBISS.SI-ID 46130477]
• KERN, Boris, VIČAR, Branislava. Jezik in transspolne identitete. Slavistična revija, letn. 67, št. 2. 2019, str. 413–422. [COBISS.SI-ID 44910637]
• KERN, Boris. Stopenjsko besedotvorje (Na primeru glagolov čutnega zaznavanja). Ljubljana: Založba ZRC, ZRC SAZU, 2017. [COBISS.SI-ID 291202048]
• KERN, Boris. Wpływy obcojęzyczne na współczesny język słoweński. Rozprawy Komisji Językowej, T. 62. 2016, str. 39–48. [COBISS.SI-ID 64288610]
• KERN, Boris. Zagadnienia słowotwórcze w wybranych podręcznikach do nauczania języków słoweńskiego i polskiego jako obcych. V: PAŁUSZYŃSKA, Edyta (ur.), GARNCAREK, Piotr (ur.), MAŁYSKA, Agata (ur.). Glottodydaktyka – media – komunikacja. Negocjowanie znaczeń. 2014, str. 211–220. [COBISS.SI-ID 57085282]
• KERN, Boris. Politična korektnost v slovaropisju. V: ZULJAN KUMAR, Danila (ur.), DOBROVOLJC, Helena (ur.). Zbornik prispevkov s simpozija 2013 – Škrabčevi dnevi 8; 2013; Nova Gorica. 2015, str. 144–154. [COBISS.SI-ID 41919789]