Language tools with real-world impact
18 October 2022
Cardiff University is leading on two new resources to automatically summarise Welsh language texts and to develop a Welsh language online thesaurus.
The School of Welsh and the School of English, Communication and Philosophy at Cardiff University, in partnership with the School of Computing and Communications at Lancaster University have been collaborating on innovative Welsh language technology projects, funded by the Welsh Government.
With its beta version launched in July, the Welsh Automatic Text Summarisation tool Adnodd Creu Crynodebau (ACC)is a publicly available Welsh language automatic text summarisation tool.
The new tool aims to enable professionals, educators and the wider public, to quicky summarise long documents for personal and professional use.
Dr Dawn Knight, project lead and Director of Research Funding at School of English, Communication and Philosophy said:
‘ACC has the potential to provide users with succinct and coherent summaries of texts. As this is something that is often time-consuming and difficult to conduct manually, ACC has the potential to save users a lot of time and to help, for example, those who may have difficulties reading long and complex documents in the Welsh language’.
Now, the team leading on the Thesawrws project aims to develop an open-access, freely available online thesaurus of the Welsh language, for Welsh speakers and learners alike. Users will be able to use the website interface to search for synonyms. For example, searching for the word ‘to search’ could show synonyms like ‘look for’, ‘pursue’, and ‘explore’.
As Dr Jonathan Morris, project lead for the Thesawrws resource and Director of Research at Cardiff University’s School of Welsh notes: ‘It is a pleasure to be continuing work on this interdisciplinary project with colleagues from Cardiff and Lancaster. Our aim is to produce an open-source thesaurus which can be built upon in the future and incorporated into existing technologies’.
In essence, the aim of the Thesawrws project is to automate the development of this Welsh language thesaurus. The project team intends to draw on the use of pre-existing word embeddings for Welsh to find related words without relying on human lexicographers, and the use of the Welsh Semantic Tagger and human evaluators to refine the tool. It is envisaged that this new resource will be launched in June 2023.
Both the Welsh Automatic Text Summarisation tool and the Thesawrws project are part of the National Corpus of Contemporary Welsh (CorCenCC), launched in 2020 with Dr Dawn Knight as project lead.