TY - JOUR
T1 - Scaling Probabilistic Tensor Canonical Polyadic Decomposition to Massive Data
AU - Cheng, Lei
AU - Wu, Yik Chung
AU - Poor, H. Vincent
N1 - Funding Information:
Manuscript received January 6, 2018; revised May 7, 2018 and July 4, 2018; accepted July 26, 2018. Date of publication August 17, 2018; date of current version September 24, 2018. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Remy Boyer. This work was supported in part by the U.S. National Science Foundation under Grant DMS-1736417. (Corresponding author: Yik-Chung Wu.) L. Cheng and Y.-C. Wu are with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong (e-mail:,leicheng@eee. hku.hk; ycwu@eee.hku.hk).
Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/1
Y1 - 2018/11/1
N2 - Tensor canonical polyadic decomposition (CPD) has recently emerged as a promising mathematical tool in multidimensional data analytics. Traditionally, the alternating least-squares method is the workhorse for tensor CPD, but it requires knowing the tensor rank. A probabilistic approach overcomes this challenge by incorporating the tensor rank determination as an integral part of the CPD process. However, the current probabilistic tensor CPD method is derived for batch-mode operation, meaning that it needs to process the whole dataset at the same time. Obviously, this is no longer suitable for large datasets. To enable tensor CPD in a massive data paradigm, in this paper, the idea of stochastic optimization is introduced into the probabilistic tensor CPD, rendering a scalable algorithm that only processes mini-batch data at a time. Numerical studies on synthetic data and real-world applications are presented to demonstrate that the proposed scalable tensor CPD algorithm performs almost identically to the corresponding batch-mode algorithm while saving a significant amount of computation time.
AB - Tensor canonical polyadic decomposition (CPD) has recently emerged as a promising mathematical tool in multidimensional data analytics. Traditionally, the alternating least-squares method is the workhorse for tensor CPD, but it requires knowing the tensor rank. A probabilistic approach overcomes this challenge by incorporating the tensor rank determination as an integral part of the CPD process. However, the current probabilistic tensor CPD method is derived for batch-mode operation, meaning that it needs to process the whole dataset at the same time. Obviously, this is no longer suitable for large datasets. To enable tensor CPD in a massive data paradigm, in this paper, the idea of stochastic optimization is introduced into the probabilistic tensor CPD, rendering a scalable algorithm that only processes mini-batch data at a time. Numerical studies on synthetic data and real-world applications are presented to demonstrate that the proposed scalable tensor CPD algorithm performs almost identically to the corresponding batch-mode algorithm while saving a significant amount of computation time.
KW - Large-scale tensor decomposition
KW - automatic rank determination
KW - scalable algorithm
KW - variational inference
UR - http://www.scopus.com/inward/record.url?scp=85051774729&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051774729&partnerID=8YFLogxK
U2 - 10.1109/TSP.2018.2865407
DO - 10.1109/TSP.2018.2865407
M3 - Article
AN - SCOPUS:85051774729
SN - 1053-587X
VL - 66
SP - 5534
EP - 5548
JO - IEEE Transactions on Signal Processing
JF - IEEE Transactions on Signal Processing
IS - 21
M1 - 8438918
ER -