Logion: Machine-Learning Based Detection and Correction of Textual Errors in Greek Philology

Charlie Cowen-Breen, Creston Brooks, Johannes Haubold, Barbara Graziosi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present statistical and machine-learning based techniques for detecting and correcting errors in text and apply them to the challenge of textual corruption in Greek philology. Most ancient Greek texts reach us through a long process of copying, in relay, from earlier manuscripts (now lost). In this process of textual transmission, copying errors tend to accrue. After training a BERT model on the largest premodern Greek dataset used for this purpose to date, we identify and correct previously undetected errors made by scribes in the process of textual transmission, in what is, to our knowledge, the first successful identification of such errors via machine learning. The premodern Greek BERT model we train is available for use at https://huggingface.co/cabrooks/LOGION-base.

Original languageEnglish (US)
Title of host publicationRANLP-ALP 2023 - Proceedings of the Ancient Language Processing Workshop, associated with 14th International Conference on Recent Advances in Natural Language Processing
EditorsAdam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti
PublisherIncoma Ltd
Pages170-178
Number of pages9
ISBN (Electronic)9789544520878
DOIs
StatePublished - 2023
Event1st Workshop on Ancient Language Processing, ALP 2023 - Varna, Bulgaria
Duration: Sep 8 2023 → …

Publication series

NameRANLP-ALP 2023 - Proceedings of the Ancient Language Processing Workshop, associated with 14th International Conference on Recent Advances in Natural Language Processing

Conference

Conference1st Workshop on Ancient Language Processing, ALP 2023
Country/TerritoryBulgaria
CityVarna
Period9/8/23 → …

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Logion: Machine-Learning Based Detection and Correction of Textual Errors in Greek Philology'. Together they form a unique fingerprint.

Cite this