TY - GEN
T1 - Class-Discriminative CNN Compression
AU - Liu, Yuchen
AU - Wentzlaff, David
AU - Kung, S. Y.
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Compressing convolutional neural networks (CNNs) by pruning and distillation has received ever-increasing focus. In particular, designing a class-discrimination based approach would be desired as it fits seamlessly with the CNNs training objective. In this paper, we propose class-discriminative compression (CDC), which injects class discrimination in both pruning and distillation to facilitate the CNNs training goal. We first study the effectiveness of a group of discriminant functions for channel pruning, where we include well-known single-variate binary-class statistics like Student's T-Test in our study via an intuitive generalization. We then propose a novel layer-adaptive hierarchical pruning approach, where we use a coarse class discrimination scheme for early layers and a fine one for later layers. This method naturally accords with the fact that CNNs process coarse semantics in the early layers and extract fine concepts at the later. Moreover, we leverage discriminant component analysis (DCA) to distill knowledge of intermediate representations in a subspace with rich discriminative information, which enhances hidden layers' linear separability and classification accuracy of the student. Combining pruning and distillation, CDC is evaluated on CIFAR and ILSVRC-2012, where we consistently outperform the state-of-the-art results.
AB - Compressing convolutional neural networks (CNNs) by pruning and distillation has received ever-increasing focus. In particular, designing a class-discrimination based approach would be desired as it fits seamlessly with the CNNs training objective. In this paper, we propose class-discriminative compression (CDC), which injects class discrimination in both pruning and distillation to facilitate the CNNs training goal. We first study the effectiveness of a group of discriminant functions for channel pruning, where we include well-known single-variate binary-class statistics like Student's T-Test in our study via an intuitive generalization. We then propose a novel layer-adaptive hierarchical pruning approach, where we use a coarse class discrimination scheme for early layers and a fine one for later layers. This method naturally accords with the fact that CNNs process coarse semantics in the early layers and extract fine concepts at the later. Moreover, we leverage discriminant component analysis (DCA) to distill knowledge of intermediate representations in a subspace with rich discriminative information, which enhances hidden layers' linear separability and classification accuracy of the student. Combining pruning and distillation, CDC is evaluated on CIFAR and ILSVRC-2012, where we consistently outperform the state-of-the-art results.
UR - http://www.scopus.com/inward/record.url?scp=85143627177&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143627177&partnerID=8YFLogxK
U2 - 10.1109/ICPR56361.2022.9956066
DO - 10.1109/ICPR56361.2022.9956066
M3 - Conference contribution
AN - SCOPUS:85143627177
T3 - Proceedings - International Conference on Pattern Recognition
SP - 2070
EP - 2077
BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th International Conference on Pattern Recognition, ICPR 2022
Y2 - 21 August 2022 through 25 August 2022
ER -