Language Technology

Objectives and competences

The course objectives are to give:
an overview of language technology, related topics in information theory, text copora for Slovenian language and corresponding tools, basic understanding of the structure of web pages, the relevant markup languages such as HTML and XML.
Students get the competence in evaluation of electronic language resources, in preparation of language-related reports for the web environment.
They learn a new approach to the possibilities in solving language problems, an approach offered by contemporary, web-based time.

Prerequisites

The course does not require any special skills or knowledge, not covered by previous education of a future linguist. All that is needed is basic knowledge of computer use, some experience in usage of web resources and, last but not least, reasonable command of English language.

Content

• Overview of the field of language technology
• Basic web skills
• Overview of markup languages such as HTML and XML
• Text corpora and related tools, especially for the Slovenian language
• Term paper in the form of a web page with statistical analysis of a chosen Slovenian or English fiction text, including its lemmatization and preparation of a dictionary of open-class words.

Intended learning outcomes

Students learn how to use a modern tool for text analysis and its potential in testing of linguistic hypotheses. They understand the inner structure of simple and machine-generated web pages, they get an overview of Slovenian language corpora and their use. Students learn how to make a statistical description of a given text, including the preparation of the frequency dictionary of open-class words.

Readings

  • D. Jurafsky, J. H. Martin, 2009. Speech and language processing, 2. izdaja, Prentice Hall, 1024 str. Catalogue
  • C. D. Manning in H. Schütze, 1999. Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA, 620 str. Catalogue
  • A. Witt in D. Metzing (Ur.), 2010. Linguistic Modeling of Information and Markup Languages, zbirka Text, Speech and Language Technology, Vol. 40, Springer, 266 str. E-version
  • G. Leech, P. Rayson, A. Wilson, 2001. Word Frequencies in Written and Spoken English: based on the British National Corpus. Longman, London, 320 str. E-version
  • Prispevki s konferenc Association for Computational Linguistics (ACL) E-version
  • ACL wiki E-version
  • V. Gorjanc, 2005. Uvod v korpusno jezikoslovje. Izolit, Domžale, 163 str. Catalogue
  • P. Jakopin, 2002. Entropija v slovenskih leposlovnih besedilih. Založba ZRC, Ljubljana, 208 str. Catalogue

Assessment

Term paper in the form of a web page, its presentation (60%), oral exam (40%).

Lecturer's references

Doc. dr. Boris Kern je znanstveni sodelavec na Inštitutu za slovenski jezik Frana Ramovša ZRC SAZU, kot zunanji sodelavec pa predava na Fakulteti za humanistiko Univerze v Novi Gorici. Je soavtor in glavni urednik osrednjih slovarjev sodobnega slovenskega jezika: eSSKJ: Slovarja slovenskega knjižnega jezika in ePravopisa (do leta 2019), kot soavtor je sodeloval tudi pri Slovarju novejšega besedja slovenskega jezika in drugi, prenovljeni izdaji Slovarja slovenskega knjižnega jezika. Je vodja skupine za besedotvorje in za normativistiko pri pripravi eSSKJ-ja. Njegova raziskovalna področja so: besedotvorje, leksikologija, leksikografija, pravopis, glasoslovje slovenskega jezika ter slovenščina kot drugi in tuji jezik.

Izbor člankov:
• KERN, Boris. Kombinatorika priponskih obrazil v besedotvornih sestavih glagolov čutnega zaznavanja. V: KRAJNC IVIČ, Mira (ur.), ŽELE, Andreja (ur.). Pogled v jezik in iz jezika : Adi Vidovič Muha ob jubileju, (Mednarodna knjižna zbirka Zora, 133). Maribor: Univerzitetna založba / University Press. 2020, str. 67–79. [COBISS.SI-ID 46130477]
• KERN, Boris, VIČAR, Branislava. Jezik in transspolne identitete. Slavistična revija, letn. 67, št. 2. 2019, str. 413–422. [COBISS.SI-ID 44910637]
• KERN, Boris. Stopenjsko besedotvorje (Na primeru glagolov čutnega zaznavanja). Ljubljana: Založba ZRC, ZRC SAZU, 2017. [COBISS.SI-ID 291202048]
• KERN, Boris. Wpływy obcojęzyczne na współczesny język słoweński. Rozprawy Komisji Językowej, T. 62. 2016, str. 39–48. [COBISS.SI-ID 64288610]
• KERN, Boris. Zagadnienia słowotwórcze w wybranych podręcznikach do nauczania języków słoweńskiego i polskiego jako obcych. V: PAŁUSZYŃSKA, Edyta (ur.), GARNCAREK, Piotr (ur.), MAŁYSKA, Agata (ur.). Glottodydaktyka – media – komunikacja. Negocjowanie znaczeń. 2014, str. 211–220. [COBISS.SI-ID 57085282]
• KERN, Boris. Politična korektnost v slovaropisju. V: ZULJAN KUMAR, Danila (ur.), DOBROVOLJC, Helena (ur.). Zbornik prispevkov s simpozija 2013 – Škrabčevi dnevi 8; 2013; Nova Gorica. 2015, str. 144–154. [COBISS.SI-ID 41919789]