TY - GEN
T1 - Logion
T2 - 1st Workshop on Ancient Language Processing, ALP 2023
AU - Cowen-Breen, Charlie
AU - Brooks, Creston
AU - Haubold, Johannes
AU - Graziosi, Barbara
N1 - Publisher Copyright:
© RANLP-ALP 2023 - Proceedings of the Ancient Language Processing Workshop, associated with 14th International Conference on Recent Advances in Natural Language Processing.
PY - 2023
Y1 - 2023
N2 - We present statistical and machine-learning based techniques for detecting and correcting errors in text and apply them to the challenge of textual corruption in Greek philology. Most ancient Greek texts reach us through a long process of copying, in relay, from earlier manuscripts (now lost). In this process of textual transmission, copying errors tend to accrue. After training a BERT model on the largest premodern Greek dataset used for this purpose to date, we identify and correct previously undetected errors made by scribes in the process of textual transmission, in what is, to our knowledge, the first successful identification of such errors via machine learning. The premodern Greek BERT model we train is available for use at https://huggingface.co/cabrooks/LOGION-base.
AB - We present statistical and machine-learning based techniques for detecting and correcting errors in text and apply them to the challenge of textual corruption in Greek philology. Most ancient Greek texts reach us through a long process of copying, in relay, from earlier manuscripts (now lost). In this process of textual transmission, copying errors tend to accrue. After training a BERT model on the largest premodern Greek dataset used for this purpose to date, we identify and correct previously undetected errors made by scribes in the process of textual transmission, in what is, to our knowledge, the first successful identification of such errors via machine learning. The premodern Greek BERT model we train is available for use at https://huggingface.co/cabrooks/LOGION-base.
UR - http://www.scopus.com/inward/record.url?scp=85184997641&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184997641&partnerID=8YFLogxK
U2 - 10.26615/978-954-452-087-8_020
DO - 10.26615/978-954-452-087-8_020
M3 - Conference contribution
AN - SCOPUS:85184997641
T3 - RANLP-ALP 2023 - Proceedings of the Ancient Language Processing Workshop, associated with 14th International Conference on Recent Advances in Natural Language Processing
SP - 170
EP - 178
BT - RANLP-ALP 2023 - Proceedings of the Ancient Language Processing Workshop, associated with 14th International Conference on Recent Advances in Natural Language Processing
A2 - Anderson, Adam
A2 - Gordin, Shai
A2 - Li, Bin
A2 - Liu, Yudong
A2 - Passarotti, Marco C.
PB - Incoma Ltd
Y2 - 8 September 2023
ER -