TY - GEN
T1 - Scalable and Memory-Efficient Kernel Ridge Regression
AU - Chavez, Gustavo
AU - Liu, Yang
AU - Ghysels, Pieter
AU - Li, Xiaoye Sherry
AU - Rebrova, Elizaveta
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - We present a scalable and memory-efficient framework for kernel ridge regression. We exploit the inherent rank deficiency of the kernel ridge regression matrix by constructing an approximation that relies on a hierarchy of low-rank factorizations of tunable accuracy, rather than leverage scores or other subsampling techniques. Without ever decompressing the kernel matrix approximation, we propose factorization and solve methods to compute the weight(s) for a given set of training and test data. We show that our method performs an optimal number of operations O (r2n) with respect to the number of training samples (n) due to the underlying numerical low-rank (r) structure of the kernel matrix. Furthermore, each algorithm is also presented in the context of a massively parallel computer system, exploiting two levels of concurrency that take into account both shared-memory and distributed-memory inter-node parallelism. In addition, we present a variety of experiments using popular datasets-small, and large-to show that our approach provides sufficient accuracy in comparison with state-of-the-art methods and with the exact (i.e. non-approximated) kernel ridge regression method. For datasets, in the order of 106 data points, we show that our framework strong-scales to 103 cores. Finally, we provide a Python interface to the scikit-learn library so that scikit-learn can leverage our high-performance solver library to achieve much-improved performance and memory footprint.
AB - We present a scalable and memory-efficient framework for kernel ridge regression. We exploit the inherent rank deficiency of the kernel ridge regression matrix by constructing an approximation that relies on a hierarchy of low-rank factorizations of tunable accuracy, rather than leverage scores or other subsampling techniques. Without ever decompressing the kernel matrix approximation, we propose factorization and solve methods to compute the weight(s) for a given set of training and test data. We show that our method performs an optimal number of operations O (r2n) with respect to the number of training samples (n) due to the underlying numerical low-rank (r) structure of the kernel matrix. Furthermore, each algorithm is also presented in the context of a massively parallel computer system, exploiting two levels of concurrency that take into account both shared-memory and distributed-memory inter-node parallelism. In addition, we present a variety of experiments using popular datasets-small, and large-to show that our approach provides sufficient accuracy in comparison with state-of-the-art methods and with the exact (i.e. non-approximated) kernel ridge regression method. For datasets, in the order of 106 data points, we show that our framework strong-scales to 103 cores. Finally, we provide a Python interface to the scikit-learn library so that scikit-learn can leverage our high-performance solver library to achieve much-improved performance and memory footprint.
UR - http://www.scopus.com/inward/record.url?scp=85088898983&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088898983&partnerID=8YFLogxK
U2 - 10.1109/IPDPS47924.2020.00102
DO - 10.1109/IPDPS47924.2020.00102
M3 - Conference contribution
AN - SCOPUS:85088898983
T3 - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
SP - 956
EP - 965
BT - Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2020
Y2 - 18 May 2020 through 22 May 2020
ER -