TY - JOUR
T1 - Sparseness-constrained nonnegative tensor factorization for detecting topics at different time scales
AU - Kassab, Lara
AU - Kryshchenko, Alona
AU - Lyu, Hanbaek
AU - Molitor, Denali
AU - Needell, Deanna
AU - Rebrova, Elizaveta
AU - Yuan, Jiahong
N1 - Publisher Copyright:
Copyright © 2024 Kassab, Kryshchenko, Lyu, Molitor, Needell, Rebrova and Yuan.
PY - 2024
Y1 - 2024
N2 - Temporal text data, such as news articles or Twitter feeds, often comprises a mixture of long-lasting trends and transient topics. Effective topic modeling strategies should detect both types and clearly locate them in time. We first demonstrate that nonnegative CANDECOMP/PARAFAC decomposition (NCPD) can automatically identify topics of variable persistence. We then introduce sparseness-constrained NCPD (S-NCPD) and its online variant to control the duration of the detected topics more effectively and efficiently, along with theoretical analysis of the proposed algorithms. Through an extensive study on both semi-synthetic and real-world datasets, we find that our S-NCPD and its online variant can identify both short- and long-lasting temporal topics in a quantifiable and controlled manner, which traditional topic modeling methods are unable to achieve. Additionally, the online variant of S-NCPD shows a faster reduction in reconstruction error and results in more coherent topics compared to S-NCPD, thus achieving both computational efficiency and quality of the resulting topics. Our findings indicate that S-NCPD and its online variant are effective tools for detecting and controlling the duration of topics in temporal text data, providing valuable insights into both persistent and transient trends.
AB - Temporal text data, such as news articles or Twitter feeds, often comprises a mixture of long-lasting trends and transient topics. Effective topic modeling strategies should detect both types and clearly locate them in time. We first demonstrate that nonnegative CANDECOMP/PARAFAC decomposition (NCPD) can automatically identify topics of variable persistence. We then introduce sparseness-constrained NCPD (S-NCPD) and its online variant to control the duration of the detected topics more effectively and efficiently, along with theoretical analysis of the proposed algorithms. Through an extensive study on both semi-synthetic and real-world datasets, we find that our S-NCPD and its online variant can identify both short- and long-lasting temporal topics in a quantifiable and controlled manner, which traditional topic modeling methods are unable to achieve. Additionally, the online variant of S-NCPD shows a faster reduction in reconstruction error and results in more coherent topics compared to S-NCPD, thus achieving both computational efficiency and quality of the resulting topics. Our findings indicate that S-NCPD and its online variant are effective tools for detecting and controlling the duration of topics in temporal text data, providing valuable insights into both persistent and transient trends.
KW - nonnegative CP decomposition
KW - online tensor factorization
KW - sparseness
KW - temporal data
KW - topic modeling
UR - http://www.scopus.com/inward/record.url?scp=85200248111&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85200248111&partnerID=8YFLogxK
U2 - 10.3389/fams.2024.1287074
DO - 10.3389/fams.2024.1287074
M3 - Article
AN - SCOPUS:85200248111
SN - 2297-4687
VL - 10
JO - Frontiers in Applied Mathematics and Statistics
JF - Frontiers in Applied Mathematics and Statistics
M1 - 1287074
ER -