Times are given in Irish Standard Time (IST), i.e., either UTC+0 or UTC+1 (Daylight Saving).

Loading Events

« All Events

  • This event has passed.

Cardamom Seminar Series #17 – Dr Jonathan Dunn (University of Canterbury) CANCELLED

November 21, 2022 @ 5:00 pm 6:00 pm GMT

Improving Corpus Resources for Low-Resource Languages

The Unit for Linguistic Data at the Insight SFI Research Centre for Data Analytics / Data Science InstituteUniversity of Galway, is delighted to welcome Dr Jonathan Dunn, a senior lecturer at the University of Canterbury, to be the next speaker in our seminar series. He will talk about building resources for low-resource languages, focusing on Austronesian languages. Register here.


This talk presents recent work on building and improving corpora for low-resource languages, focusing on Austronesian languages and the Pacific region. First, in respect to language identification, we consider using geographic meta-data to constrain the inventory of minority languages. Second, regarding corpus validity, we consider using corpus similarity measures to ensure the ongoing consistency of data representing a particular language in a particular register.

About the Speaker:

Dr Jonathan Dunn is a senior lecturer at the University of Canterbury and Leader of Language Technology Theme at the New Zealand Institute for Language, Brain and Behaviour. He was a visiting scientist at the National Geospatial-Intelligence Agency and was a research assistant professor of Computer Science at the Illinois Institute of Technology from 2015-2018. Dr Dunn is a computational linguist whose research interest is to use data science to model both the emergence of grammatical structure and variation in grammatical structure using large multi-lingual corpora. His recent work focuses on linguistic variation’s impacts on NLP models and low-resource contexts. He has published over 30 papers, and Cambridge University Press has published his first book. His interdisciplinary teaching experience includes a MOOC that has taught over 11,000 students about NLP.


The seminar series is led by the Cardamom project team. The Cardamom project aims to close the resource gap for minority and under-resourced languages using deep-learning-based natural language processing (NLP) and exploiting similarities of closely related languages. The project further extends this idea to historical languages, which can be considered closely related to their modern form. It aims to provide NLP through both space and time for languages that current approaches have ignored.

Registration link: