Cardamom Seminar Series #7 – Will Lamb (University of Edinburgh)

November 30 @ 5:00 pm 6:00 pm GMT

The Unit for Linguistic Data at the Insight SFI Research Centre for Data Analytics / Data Science Institute, National University of Ireland Galway is delighted to welcome Dr Will Lamb, Senior Lecturer in Scottish Ethnology / EFI Research Affiliate at The University of Edinburgh, to be the next speaker in our seminar series. The title of his talk is ‘Scottish Gaelic Language Technology: Current Provision and Future Potentials’. Register here!


For a minority language with less than 60k speakers, Scottish Gaelic has a surprising level of provision in language technology. Over the past ten years, researchers have developed: part-of-speech taggers, lemmatisers, machine translation systems, an orthographic normaliser, a text-to-speech system, a syntactic parser, a handwriting recogniser and, most recently, a speech-to-text system. This talk will outline the current state-of-the-art and how some of these tools are being used within education and applied research. It will then consider what is necessary to move towards next generation NLP and NLG systems, such as a virtual assistant.  

About the Speaker:

Dr Will Lamb is a Senior Lecturer in Celtic and Scottish Studies at the University of Edinburgh. Originally from Baltimore, MD he came to Edinburgh in 1996 to study an MSc in Celtic Studies (University of Edinburgh). It was during his PhD on Gaelic register variation (1997–2002), also at the University of Edinburgh, that he became interested in language technology. Along with his brother, a computer programmer, he designed an early concordancer, which he used during his study of Gaelic register variation. As part of his research, he put together the first part-of-speech tagged corpus of Scottish Gaelic.

From 2000 to 2010, Will was a lecturer in Gaelic language and music at Lews Castle College in Benbecula. In 2010, he returned to the University of Edinburgh as a Lecturer in Scottish Ethnology. From 2012, he has led a number of funded studies on Gaelic NLP. Using the tagged corpus that he developed during his PhD, he helped to develop the first part-of-speech tagger for the language, and its first representative, tagged corpus (ARCOSG). Subsequently, he has had a hand in developing a number of other language technologies for the language, including a lemmatiser, a handwriting recogniser, a syntactic parser and, more recently, an automatic speech recognition system. He is currently the PI for a three-year, international digital humanities project funded by the AHRC and IRC, ‘Decoding Hidden Heritages’. This project will make available millions of words of vernacular Gaelic folklore in text and audio, providing valuable data for next generation language and acoustic models.


The seminar series is led by the Cardamom project team. The Cardamom project aims to close the resource gap for minority and under-resourced languages using deep-learning-based natural language processing (NLP) and exploiting similarities of closely related languages. The project further extends this idea to historical languages, which can be considered closely related to their modern form. It aims to provide NLP through both space and time for languages that current approaches have ignored.

