TY - GEN
T1 - A Kernel Discriminant Information Approach to Non-linear Feature Selection
AU - Hou, Zejiang
AU - Kung, S. Y.
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/7
Y1 - 2019/7
N2 - Feature selection has become a de facto tool for analyzing high-dimensional data, especially in bioinformatics. It is effective in improving learning algorithms' scalability and facilitating feature generalization or interpretability by removing noise and redundancy. Our focus is placed on the paradigm of supervised feature selection, which aims to find an optimal feature subset to best predict the target. We propose a nonlinear approach for finding a feature subset that achieves the highest inter-class separability in terms of the kernel Discriminant Information (KDI) measure. Theoretically, we prove the existence of good prediction hypotheses for feature subsets with high KDI value. We also establish the equivalency between maximizing the KDI statistic and minimizing a functional dependency measure of label variable on data. Moreover, we asymptotically prove the concentration property of the optimal feature subset found by maximizing the KDI measure. Practically, we provide an efficient gradient optimization algorithm for solving the KDI feature selection problem. We evaluate the proposed method based on 19 benchmark datasets in various domains, and demonstrates a noticeable improvement against state-of-the-art baselines on the majority of classification and regression tasks. Notably, our method is robust to the choice of hyper-parameters, works well with various downstream classifiers, has competitive computational complexity among the kernel based methods considered, and scales well the large-scale object recognition dataset, with generalization enhancement on CIFAR.
AB - Feature selection has become a de facto tool for analyzing high-dimensional data, especially in bioinformatics. It is effective in improving learning algorithms' scalability and facilitating feature generalization or interpretability by removing noise and redundancy. Our focus is placed on the paradigm of supervised feature selection, which aims to find an optimal feature subset to best predict the target. We propose a nonlinear approach for finding a feature subset that achieves the highest inter-class separability in terms of the kernel Discriminant Information (KDI) measure. Theoretically, we prove the existence of good prediction hypotheses for feature subsets with high KDI value. We also establish the equivalency between maximizing the KDI statistic and minimizing a functional dependency measure of label variable on data. Moreover, we asymptotically prove the concentration property of the optimal feature subset found by maximizing the KDI measure. Practically, we provide an efficient gradient optimization algorithm for solving the KDI feature selection problem. We evaluate the proposed method based on 19 benchmark datasets in various domains, and demonstrates a noticeable improvement against state-of-the-art baselines on the majority of classification and regression tasks. Notably, our method is robust to the choice of hyper-parameters, works well with various downstream classifiers, has competitive computational complexity among the kernel based methods considered, and scales well the large-scale object recognition dataset, with generalization enhancement on CIFAR.
KW - Discriminant Component Analysis
KW - Discriminant Information
KW - Nonlinear feature selection
KW - supervised kernel methods
UR - http://www.scopus.com/inward/record.url?scp=85073236642&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073236642&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2019.8852186
DO - 10.1109/IJCNN.2019.8852186
M3 - Conference contribution
AN - SCOPUS:85073236642
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2019 International Joint Conference on Neural Networks, IJCNN 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 International Joint Conference on Neural Networks, IJCNN 2019
Y2 - 14 July 2019 through 19 July 2019
ER -