TY - JOUR
T1 - Massive data clustering by multi-scale psychological observations
AU - Yang, Shusen
AU - Zhang, Liwen
AU - Xu, Chen
AU - Yu, Hanqiao
AU - Fan, Jianqing
AU - Xu, Zongben
N1 - Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press on behalf of China Science Publishing & Media Ltd.
PY - 2022/2/1
Y1 - 2022/2/1
N2 - Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber-Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains.
AB - Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber-Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains.
KW - Clustering
KW - Cognitive interpretability
KW - Computational scalability
KW - Massive data
KW - Psychological observation
KW - Weber-Fechner law
UR - http://www.scopus.com/inward/record.url?scp=85126971115&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126971115&partnerID=8YFLogxK
U2 - 10.1093/nsr/nwab183
DO - 10.1093/nsr/nwab183
M3 - Article
C2 - 35242339
AN - SCOPUS:85126971115
SN - 2095-5138
VL - 9
JO - National Science Review
JF - National Science Review
IS - 2
M1 - nwab183
ER -