TY - JOUR
T1 - Shape autotuning activation function [Formula presented]
AU - Zhou, Yuan
AU - Li, Dandan
AU - Huo, Shuwei
AU - Kung, Sun Yuan
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2021/6/1
Y1 - 2021/6/1
N2 - The choice of activation function is essential for building state-of-the-art neural networks. At present, the most widely-used activation function with effectiveness is ReLU. However, ReLU suffers from the weakness including non-zero mean, negative missing, and unbounded output, thus it has potential disadvantages in the optimization process. In this paper, we propose a novel activation function, namely “Shape Autotuning Activation Function” (SAAF), to overcome these three challenges simultaneously. The SAAF inherits merits of smooth activation functions (such as Sigmoid and Tanh) and piecewise activation functions (such as ReLU and its variants), and avoids their deficiencies. Specifically, the SAAF adaptively adjusts a pair of independent trainable parameters to capture negative information and provide a near-zero mean output, resulting in better generalization performance and faster learning speed. At the same time, it provides bounded outputs to ensure a more stable distribution of output during network training. We evaluated SAAF on deep networks applied to a variety of tasks, including image classification, machine translation, and generative modeling. Comprehensive comparison study shows that the proposed SAAF is superior to state-of-the-art activation functions.
AB - The choice of activation function is essential for building state-of-the-art neural networks. At present, the most widely-used activation function with effectiveness is ReLU. However, ReLU suffers from the weakness including non-zero mean, negative missing, and unbounded output, thus it has potential disadvantages in the optimization process. In this paper, we propose a novel activation function, namely “Shape Autotuning Activation Function” (SAAF), to overcome these three challenges simultaneously. The SAAF inherits merits of smooth activation functions (such as Sigmoid and Tanh) and piecewise activation functions (such as ReLU and its variants), and avoids their deficiencies. Specifically, the SAAF adaptively adjusts a pair of independent trainable parameters to capture negative information and provide a near-zero mean output, resulting in better generalization performance and faster learning speed. At the same time, it provides bounded outputs to ensure a more stable distribution of output during network training. We evaluated SAAF on deep networks applied to a variety of tasks, including image classification, machine translation, and generative modeling. Comprehensive comparison study shows that the proposed SAAF is superior to state-of-the-art activation functions.
KW - Activation functions
KW - Deep learning
UR - http://www.scopus.com/inward/record.url?scp=85099711749&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099711749&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2020.114534
DO - 10.1016/j.eswa.2020.114534
M3 - Article
AN - SCOPUS:85099711749
SN - 0957-4174
VL - 171
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 114534
ER -