Scalable and Memory-Efficient Kernel Ridge Regression

Gustavo Chavez, Yang Liu, Pieter Ghysels, Xiaoye Sherry Li, Elizaveta Rebrova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a scalable and memory-efficient framework for kernel ridge regression. We exploit the inherent rank deficiency of the kernel ridge regression matrix by constructing an approximation that relies on a hierarchy of low-rank factorizations of tunable accuracy, rather than leverage scores or other subsampling techniques. Without ever decompressing the kernel matrix approximation, we propose factorization and solve methods to compute the weight(s) for a given set of training and test data. We show that our method performs an optimal number of operations O (r2n) with respect to the number of training samples (n) due to the underlying numerical low-rank (r) structure of the kernel matrix. Furthermore, each algorithm is also presented in the context of a massively parallel computer system, exploiting two levels of concurrency that take into account both shared-memory and distributed-memory inter-node parallelism. In addition, we present a variety of experiments using popular datasets-small, and large-to show that our approach provides sufficient accuracy in comparison with state-of-the-art methods and with the exact (i.e. non-approximated) kernel ridge regression method. For datasets, in the order of 106 data points, we show that our framework strong-scales to 103 cores. Finally, we provide a Python interface to the scikit-learn library so that scikit-learn can leverage our high-performance solver library to achieve much-improved performance and memory footprint.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages956-965
Number of pages10
ISBN (Electronic)9781728168760
DOIs
StatePublished - May 2020
Externally publishedYes
Event34th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2020 - New Orleans, United States
Duration: May 18 2020May 22 2020

Publication series

NameProceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020

Conference

Conference34th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2020
Country/TerritoryUnited States
CityNew Orleans
Period5/18/205/22/20

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Scalable and Memory-Efficient Kernel Ridge Regression'. Together they form a unique fingerprint.

Cite this