A Mathematical Model for Universal Semantics

E. Weinan, Yajun Zhou

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis of recurring patterns in texts. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics, discover synonyms, and sketch semantic fields from a particular document of moderate length, without consulting external knowledge-base or thesaurus. Our Markov semantic model allows us to represent each topical concept by a low-dimensional vector, interpretable as algebraic invariants in succinct statistical operations on the document, targeting local environments of individual words. These language-independent semantic representations enable a robot reader to both understand short texts in a given language (automated question-answering) and match medium-length texts across different languages (automated word translation). Our semantic fingerprints quantify local meaning of words in 14 representative languages across five major language families, suggesting a universal and cost-effective mechanism by which human languages are processed at the semantic level. Our protocols and source codes are publicly available on https://github.com/yajun-zhou/linguae-naturalis-principia-mathematica.

Original languageEnglish (US)
Pages (from-to)1124-1132
Number of pages9
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume44
Issue number3
DOIs
StatePublished - Mar 1 2022

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Applied Mathematics
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics

Keywords

  • Recurring patterns in texts
  • hitting time
  • question answering
  • recurrence time
  • semantic model
  • word translation

Fingerprint

Dive into the research topics of 'A Mathematical Model for Universal Semantics'. Together they form a unique fingerprint.

Cite this