Times are given in Irish Standard Time (IST), i.e., either UTC+0 or UTC+1 (Daylight Saving).

Loading Events

« All Events

  • This event has passed.

Cardamom Seminar Series #19 – Dr Caoimhín Ó Donnaíle (Sabhal Mòr Ostaig)

February 27, 2023 @ 5:00 pm 6:00 pm GMT

Bunadas – a network database of cognate words: a look at the data structure

The Unit for Linguistic Data at the Insight SFI Research Centre for Data Analytics / Data Science InstituteUniversity of Galway, is delighted to welcome Dr Caoimhín Ó Donnaíle of Sabhal Mòr Ostaig, Isle of Skye to be the next speaker in our seminar series. He will talk about Bunadas – a network database of cognate words which consists of about 40 Indo-European languages. The talk will mainly focus on the data structure of the database.


Bunadas is a network database of cognate words, which can generate on the fly “family-trees” of cognate words. It contains so far about 100,000 words in 40 Indo-European languages, 70% of them being Celtic-language words. A previous seminar at the University of Arizona looked mainly at its practical use and user interface. This talk will also review this aspect briefly, but will look in more detail at the database and data structure behind Bunadas, the design decisions which went into it, their strengths and weaknesses. It will also look at the lack of data structure in Wiktionary, and the lack of rigour in the data structure of Wikidata Lexicographic data, which currently prevent these major information stores being used to construct automatically a Bunadas-like facility. Discussion and questions will be welcome.

About the Speaker:

Dr Caoimhín Ó Donnaíle taught computing and latterly also genealogy for 30 years through the medium of Gaelic at Sabhal Mòr Ostaig, Scotland’s Gaelic-medium college on the island of Skye. He made many dictionaries and other Gaelic resources available on the SMO website, and was the programmer/developer for the series of EU-funded projects which produced the multilingual resources at multidict.net.


The seminar series is led by the Cardamom project team. The Cardamom project aims to close the resource gap for minority and under-resourced languages using deep-learning-based natural language processing (NLP) and exploiting similarities of closely related languages. The project further extends this idea to historical languages, which can be considered closely related to their modern form. It aims to provide NLP through both space and time for languages that current approaches have ignored.

Registration link: