Times are given in Irish Standard Time (IST), i.e., either UTC+0 or UTC+1 (Daylight Saving).

Cardamom Seminar Series #24 – Patrícia Amaral (Indiana University Bloomington)

October 9 @ 5:00 pm 6:00 pm IST

Intrinsic evaluations for historical Portuguese and Spanish

The Unit for Linguistic Data at the Insight SFI Reseach Centre for Data Analytics / Data Science Institute, University of Galway, is delighted to welcome Dr Patrícia Amaral, a professor in the Department of Spanish and Portuguese at Indiana University Bloomington as the next speaker in our seminar series. In this talk, she will speak about the need for targeted model evaluation of models for diachronic semantic, with reference to semantic change in Medieval Spanish and Portuguese.


In this talk, we demonstrate the need for targeted model evaluation of models for diachronic semantics, since it is not appropriate to use tests developed for modern languages/corpora off the shelf, or tests for other historical corpora without adaptations. For research on semantic change that spans over several centuries, assessing the accuracy of embedding comes with two challenges: (i) native speakers who can provide judgements about meaning are not available, and (ii) historical corpora are often much smaller than contemporary datasets, which raises issues of model accuracy (Hellrich, 2019; Hu et al., 2021). This talk presents the lessons learned from developing intrinsic evaluations to test the quality of distributional models used to investigate semantic change in Medieval Spanish and Portuguese. For Spanish we experimented on a 7-million-word corpus (Chronicles corpus, with texts from 13th-16th c.) (Hu et al., 2021) and for Portuguese on a ca. 2,5 million token corpus, CIPM, with texts from 12th-16th c. (Tian et al., 2021). We argue that assessment of word embeddings for historical research must meet the following criteria: appropriateness, sustainability, comprehensiveness, and complementarity.

About the Speaker:

Dr Patrícia Amaral is a professor in the Department of Spanish and Portuguese at Indiana University Bloomington. She obtained her Ph.D. in Hispanic Linguistics from the Ohio State University, with a dissertation titled “The meaning of approximative adverbs: Evidence from European Portuguese”. After getting her Ph.D. she was a post-doc in the Linguistics Department at Stanford University, and then an Assistant Professor at the University of Liverpool in the UK and at the University of North Carolina at Chapel Hill. Before specializing in Linguistics, she studied Latin and Greek and obtained a B.A. in Classical Languages and Portuguese from the University of Coimbra, Portugal.


The seminar series is led by the Cardamom project team. The Cardamom project aims to close the resource gap for minority and under-resourced languages using deep-learning-based natural language processing (NLP) and exploiting similarities of closely related languages. The project further extends this idea to historical languages, which can be considered closely related to their modern form. It aims to provide NLP through both space and time for languages that current approaches have ignored.

