Sparseness-constrained nonnegative tensor factorization for detecting topics at different time scales

Lara Kassab, Alona Kryshchenko, Hanbaek Lyu, Denali Molitor, Deanna Needell, Elizaveta Rebrova, Jiahong Yuan

Research output: Contribution to journalArticlepeer-review

Abstract

Temporal text data, such as news articles or Twitter feeds, often comprises a mixture of long-lasting trends and transient topics. Effective topic modeling strategies should detect both types and clearly locate them in time. We first demonstrate that nonnegative CANDECOMP/PARAFAC decomposition (NCPD) can automatically identify topics of variable persistence. We then introduce sparseness-constrained NCPD (S-NCPD) and its online variant to control the duration of the detected topics more effectively and efficiently, along with theoretical analysis of the proposed algorithms. Through an extensive study on both semi-synthetic and real-world datasets, we find that our S-NCPD and its online variant can identify both short- and long-lasting temporal topics in a quantifiable and controlled manner, which traditional topic modeling methods are unable to achieve. Additionally, the online variant of S-NCPD shows a faster reduction in reconstruction error and results in more coherent topics compared to S-NCPD, thus achieving both computational efficiency and quality of the resulting topics. Our findings indicate that S-NCPD and its online variant are effective tools for detecting and controlling the duration of topics in temporal text data, providing valuable insights into both persistent and transient trends.

Original languageEnglish (US)
Article number1287074
JournalFrontiers in Applied Mathematics and Statistics
Volume10
DOIs
StatePublished - 2024
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Applied Mathematics

Keywords

  • nonnegative CP decomposition
  • online tensor factorization
  • sparseness
  • temporal data
  • topic modeling

Fingerprint

Dive into the research topics of 'Sparseness-constrained nonnegative tensor factorization for detecting topics at different time scales'. Together they form a unique fingerprint.

Cite this