Skip to main navigation Skip to search Skip to main content

Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper we present the first-ever procedure for identifying highly similar sequences of text in Chinese and Tibetan translations of Buddhist sutra literature. We initially propose this procedure as an aid to scholars engaged in the philological study of Buddhist documents. We create a cross-lingual embedding space by taking the cosine similarity of average sequence vectors in order to produce unsupervised similar cross-linguistic parallel alignments at word, sentence, and even paragraph level. Initial results show that our method lays a solid foundation for the future development of a fully-fledged Information Retrieval tool for these (and potentially other) low-resource historical languages.

Original languageEnglish (US)
Article number23
JournalJournal of Open Humanities Data
Volume8
DOIs
StatePublished - 2022
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Arts and Humanities
  • Library and Information Sciences
  • Information Systems

Keywords

  • Buddhist Chinese
  • Classical Tibetan
  • Cross-linguistic STS
  • Information Retrieval
  • Translation Studies

Fingerprint

Dive into the research topics of 'Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan'. Together they form a unique fingerprint.

Cite this