Times are given in Irish Standard Time (IST), i.e., either UTC+0 or UTC+1 (Daylight Saving).
- This event has passed.
Cardamom Seminar Series #16 – Prof Liviu P. Dinu (University of Bucharest)
October 24, 2022 @ 5:00 pm – 6:00 pm IST
Computational Tools in Historical Linguistics for Cognate detection, Borrowing Discrimination and Protoword Reconstruction
The Unit for Linguistic Data at the Insight SFI Research Centre for Data Analytics / Data Science Institute, University of Galway is delighted to welcome Prof. Liviu P. Dinu, a professor at the University of Bucharest, to be the next speaker in our seminar series. He will talk about computational tools in historical linguistics. Register here.
Natural languages are living ecosystems; they are constantly in contact and, as a consequence, they change continuously. Traditionally, the main HL problems (How are languages related? How do languages change across space and time?) have been investigated with comparative linguistics instruments. The main idea of the comparative method is to perform a property-based comparison of multiple sister languages in order to infer the properties of their common ancestor. It is a time-consuming manual process requiring a lot of intensive work.
We propose here computer-assisted methods for identifying cognates, discriminating between cognates and borrowings, and for protoword reconstruction. Firstly, we introduce a method to automatically determine if a pair of words (u,v) are cognates or not, and we apply our method to a subset of the automatically extracted dataset of cognates built for Romanian and four related Romance languages: Italian, French, Spanish and Portuguese. Secondly, we investigate the task of discriminating between cognates and borrowings. Further, we developed a methodology for automatically producing related words with the following sub-problems: proto-words reconstruction, modern word form production and cognate production. Given words in Romance modern languages, the task is to automatically reconstruct the Latin proto-words from which the modern words evolved. Then, given the form of a word u in a donor language L1, the system that we develop predicts the form v of the word u in a recipient language L2, in the hypothesis that the word v will be derived in L2 from the word u (through the borrowing process). We experiment with Romanian as a recipient language, and we investigate borrowings from more than 20 donor languages. Finally, for cognate production, we investigate if, for a given pair of languages, having one word from a cognate pair, we can automatically determine the orthographic form of its cognate.
Alina Maria Ciobanu and Liviu P. Dinu. 2019. Automatic Identification and Production of Related Words for Historical Linguistics. Computational Linguistics, 45(4):667–704.
Alina Maria Cristea, Liviu P. Dinu, Simona Georgescu, Mihnea-Lucian Mihai, and Ana Sabina Uban. 2021. Automatic Discrimination between Inherited and Borrowed Latin Words in Romance Languages. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2845–2855, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Alina Maria Ciobanu and Liviu P. Dinu. 2018. Ab Initio: Automatic Latin Proto-word Reconstruction. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1604–1614, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
About the Speaker:
Liviu P. Dinu (https://nlp.unibuc.ro/people/liviu.html) is a professor at the University of Bucharest, Computer Science Department, director of the Human Language Technologies Research Center, and member of Computer Science and Interdisciplinary Doctoral Schools. His main research is in Computational Linguistics, Natural Language Processing (NLP), Information processing, etc., and he has published two books, sevent chapters in books and over 150 scientific papers in journals and conferences. Solomon Marcus was his PhD supervisor (obtained in 2003 at the University of Bucharest), and in 2014 he defended his habilitation thesis entitled “Similarity and Decision Problems in Computational Linguistics”. He carried out postdoctoral work at the University of Trieste (2005). In 2007 he received “Grigore C. Moisil” Prize, awarded by the Romanian Academy (for 2005). He has initiated and managed 14 national and international R&D projects and was involved in 14 other R&D projects. Currently, he is PI of the project “CoToHiLi: Computational Tools for Historical Linguistics” (https://nlp.unibuc.ro/projects/cotohili.html). He also initiated a master’s program in Natural Language Processing in 2020 at the University of Bucharest.
The seminar series is led by the Cardamom project team. The Cardamom project aims to close the resource gap for minority and under-resourced languages using deep-learning-based natural language processing (NLP) and exploiting similarities of closely related languages. The project further extends this idea to historical languages, which can be considered closely related to their modern form. It aims to provide NLP through both space and time for languages that current approaches have ignored.