Times are given in Irish Standard Time (IST), i.e., either UTC+0 or UTC+1 (Daylight Saving).

Cardamom Seminar Series #11 – Dr Fahad Khan (ILC, Pisa) & Dr William Short (Exeter)

April 25 @ 5:00 pm 6:00 pm IST

Towards an Old English WordNet and the Creation of a Family of WordNets for Historical Languages

The Unit for Linguistic Data at the Insight SFI Research Centre for Data Analytics / Data Science Institute, National University of Ireland Galway, welcomes Dr Fahad Khan and Dr William Michael Short to be the next speaker in our seminar series. Dr Khan and Dr Short will highlight the benefits of creating WordNets for historical languages. They will also discuss some of the questions related to the WordNets of historical languages.


In view of the benefits that the creation of WordNets for modern languages such as English and Polish has brought to the computer-aided study of these languages, scholars of historical languages have begun to explore the development of WordNets for Latin, Greek, Sanskrit, Ancient Egyptian, and – now – Old English. However, the often very long traditions of philological study that characterise scholarship on these languages, the particular research interests of linguists, literary scholars, and cultural historians who work with these languages, and – above all – their specific syntactic, morphological, and lexical profiles invite important questions around how WordNets for historical languages might need to differ from those of modern languages, and to what uses they might then be put.

In this talk we will look at a project for developing a family of ancient Indo-European language WordNets and which is intended to answer some of these questions (Biagetti, Zanchi, and Short 2021). The project began by developing WordNets for Latin, Ancient Greek and Sanskrit; these have been designed to accommodate the highly inflected nature of these languages by incorporating detailed morphological data, including a fine-grained morphosyntactic tagging schema, principal-part information, etc, that can additionally be utilised in lemmatization tasks. Data structures have been introduced to link lemmas – as well as synsets – to etymological information, and represent the rich derivational morphology of these languages. Additionally changes to the WordNet framework for capturing semantic structures have been made, introducing – at the level of synset attribution – distinctions between literal, metonymic, and metaphoric signification, and – at a supralexical level of organisation – by incorporating the idea of ‘conceptual metaphor’ from cognitive linguistics as mapping relations between synsets (see Buccheri, Fedriani, and Short 2021). Finally, it is possible to capture diachronic sense change by tagging specific sense attributions with temporal and/or periodic annotations; synset attributions can also be marked generically to help capture stylistic variations. All the WordNets within this project are being built on single database architecture, with a common API, so that it is possible to programmatically access their lexical and semantic data in a consistent and standardised manner, regardless of language. Other, broader considerations also go into the development of this family of WordNets for historical languages: as well as being used for downstream NLP tasks, such WordNets might often be put to didactic purposes, for instance, or used, in research for comparing the different conceptual distinctions codified in word senses across languages and cultures.

As well as introducing and describing this project we will also look at a recently launched collaboration to create a WordNet for Old English within the ambit of this project. We introduce the challenges of working with this particular historical language and discuss both how the addition of Old English will contribute to elaborating and enriching the existing framework, and how researchers in Old English can help to contribute to this new resource.

About the Speaker:

Fahad Khan is a researcher at the Istituto di Linguistica Computazionale, Consiglio Nazionale delle Ricerche, Pisa, Italy. His interests include language resources, linked data, lexical standards, and the overlap between ontologies and the digital humanities. He is a member of the ISO/TC 37/WG 4 Language Resource Management working group and a co-author of the recently published Lexical Markup Framework (LMF) standard on representing etymological data. He is also a co-chair of the W3C Ontology-Lexica Community Group as well as is involved in the Nexus Linguarum Cost action.

William Michael Short, Lecturer in Classics, Department of Classics & Ancient History, University of Exeter, has pioneered the application of conceptual metaphor theory to the study of the Latin language and Latin literature. Along with numerous publications on specific aspects of the metaphorical structuring of Latin’s and, to a lesser extent, Greek’s semantic systems, his Embodiment in Latin Semantics inaugurated a research paradigm that combines cognitive linguistics, cognitive anthropology, and cultural semantics, in order to reconstruct the worldview of ancient cultures through the study of large-scale patterns of metaphor in their languages. At the same time, he leads the international efforts to create WordNets for Latin, Greek, and Sanskrit, with collaborators at Harvard’s Center for Hellenic Studies, the Universities of Pavia and Bergamo, and elsewhere.


The seminar series is led by the Cardamom project team. The Cardamom project aims to close the resource gap for minority and under-resourced languages using deep-learning-based natural language processing (NLP) and exploiting similarities of closely related languages. The project further extends this idea to historical languages, which can be considered closely related to their modern form. It aims to provide NLP through both space and time for languages that current approaches have ignored.

