TY - GEN
T1 - A kernel theory of modern data augmentation
AU - Dao, Tri
AU - Gu, Albert
AU - Ratner, Alexander J.
AU - Smith, Virginia
AU - De Sa, Christopher
AU - Ré, Christopher
N1 - Publisher Copyright:
© 36th International Conference on Machine Learning, ICML 2019. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear naturally with respect to this model, even when we do not employ kernel classification. Next, we analyze more directly the effect of augmentation on kernel classifiers, showing that data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. These frameworks both serve to illustrate the ways in which data augmentation affects the downstream learning model, and the resulting analyses provide novel connections between prior work in invariant kernels, tangent propagation, and robust optimization. Finally, we provide several proof-of-concept applications showing that our theory can be useful for accelcrating machine learning workflows, such as reducing the amount of computation needed to train using augmented data, and predicting the utility of a transformation prior to training.
AB - Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear naturally with respect to this model, even when we do not employ kernel classification. Next, we analyze more directly the effect of augmentation on kernel classifiers, showing that data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. These frameworks both serve to illustrate the ways in which data augmentation affects the downstream learning model, and the resulting analyses provide novel connections between prior work in invariant kernels, tangent propagation, and robust optimization. Finally, we provide several proof-of-concept applications showing that our theory can be useful for accelcrating machine learning workflows, such as reducing the amount of computation needed to train using augmented data, and predicting the utility of a transformation prior to training.
UR - https://www.scopus.com/pages/publications/85078058139
UR - https://www.scopus.com/pages/publications/85078058139#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:85078058139
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 2755
EP - 2775
BT - 36th International Conference on Machine Learning, ICML 2019
PB - International Machine Learning Society (IMLS)
T2 - 36th International Conference on Machine Learning, ICML 2019
Y2 - 9 June 2019 through 15 June 2019
ER -